Recovery fault recording

The recovery fault recording feature of Z Abend Investigator is provided to reduce the number of instances where an abnormal termination problem during real-time analysis prevents a normal fault entry from being created. This approach might, for example, be in the following situations:
  • Insufficient storage. (See HFZ0005S.)
  • Z Abend Investigator abended. (See HFZ0047S.)
  • Z Abend Investigator timed out. (See HFZ0092S.)
  • Invalid negative storage length request. (See HFZ0105S.)

When a terminating condition is subject to recovery fault recording processing, then a skeleton fault entry is created and an associated SDUMP (SVC dump) or IEATDUMP (transaction dump) written.

First a check is made to see if security access is granted to use SDUMP as the recovery fault recording dump type, since this type is the preferred dump type for performance reasons. (For details, see Using the XFACILIT resource class for SDUMP RFR data sets.)

Note: The MVS™ post-dump exit HFZXTSEL is required to support the recovery fault recording feature when using SDUMPs. For details of this exit, see SVC dump registration.

If SDUMP cannot be used, then IEATDUMP is instead used as the recovery fault recording dump type.

The term RFR dump refers to the recovery fault recording dump data set, regardless of which dump type is used.

The RFR dump creates an extra data set, into which MVS writes a dump of the address space. This data set takes significantly more DASD space than a minidump, but in these situations, Z Abend Investigator has failed to gather the minidump. Subsequently, the RFR dump data set is used in place of the minidump for reanalysis of the skeleton fault entry.

Note: To enable recovery fault recording processing, the HFZS subsystem must be started and the UPDINDEX parameter must be in effect (this is the default).

The history file in which the fault entry is created is either the current history file for the abending job, as determined at the time of the abnormal analysis termination, or the dehistory file for the HFZS subsystem. The current history file that is determined for the abending job is attempted to be used first if it is a PDSE. Otherwise, the HFZS subsystem dehistory file is used.

Message HFZ0126I is issued to indicate in which history file the fault entry was created.

If the RFR dump is an IEATDUMP, then it is created from the abending region. However, if it is an SDUMP, then it is created by the HFZS subsystem. The skeleton fault entry is always created by the HFZS subsystem.

Once the recovery fault recording process starts, no user exits are driven for the process, except for any Notification user exits specified in the options available to the HFZS subsystem, which are invoked when creating the skeleton recovery fault recording fault entry. To distinguish a recovery fault recording event from other invocations of Notification user exits, the NFY.NFYTYPE field is set to 'R'. For details about the Notification user exit, see Notification user exit.

If the RFR dump is an IEATDUMP, then the name of the IEATDUMP data set created is controlled by the name in the HFZRFRDS CSECT. For details about the use of this name, and information about how to change it, see Changing the default recovery fault recording IEATDUMP data set name (RFRDSN). To permit automatic deletion of IEATDUMP data sets when the fault entries they are associated with are deleted, then changing the default high-level qualifier might be required, subject to installation-specific security rules. See Managing recovery fault recording data set access for more information.

Depending on where in the real-time analysis process the problem occurred, reanalysis of the recovery fault recording fault entry is capable of producing a reanalysis report, which is effectively identical to the one that would have been produced if the real-time analysis had completed normally. The fact that a recovery fault recording fault entry was created instead of the normal real-time fault entry is almost transparent to the user for many of the recovery situations.

When a recovery fault recording fault entry is deleted, then the associated RFR dump data set is also automatically deleted to ensure that these data sets are not taking up disk space unnecessarily. Failure to delete the RFR dump data set results in message HFZ0187I.