Automated implementation

Given two MVS™ systems, A and B, which do not share DASD, fault entries that are created on system A can be copied automatically to system B for viewing or reanalysis by using the Z Abend Investigator ISPF interface.

The entries are copied by using a Notification user exit on system A, which submits a TSO batch job to transmit fault entries as PDS members to a dedicated user ID on system B. On system B, a continually running batch TSO job receives fault entries into a staging data set. It then calls the HFZUTIL batch utility to import the fault entry into a local history file.

Figure 1 shows this concept.
Figure 1. Fault entry import by way of TSO XMIT/RECEIVE
TSO XMIT/RECEIVE enables the fault entry import. An abend occurs on system A. As a result, a fault entry is written to a history file. A Notification user exit then submits a batch job, which transmits (by way of TSO XMIT) the fault entry to system B. On system B, a continually running job receives (by way of TSO RECEIVE) the fault entry into a staging history file. The batch job then calls HFZUTIL to import the new fault entry into the normal system B history file from where it can be viewed or reanalyzed.

See Customizing Z Abend Investigator by using user exits for information about user exits in general. See Notification user exit for information about the Notification user exit specifically. See Managing history files (HFZUTIL utility) for information about the HFZUTIL batch utility.

NFYEXIT: sample Notification user exit

NFYEXIT is available in softcopy format as member HFZSXNFY in data set HFZ.SHFZSAM1.
Figure 2. Sample Notification user exit (NFYEXIT) to submit batch TSO XMIT job
nodeid   = 'MVSB'                        /* <--- verify/change */        ❶
userid   = 'HFZROBOT'                    /* <--- verify/change */        ❷
jobcard  = '//NOTIFY   JOB  MSGCLASS=Z'  /* <--- verify/change */        ❸
/*********************************************************************/  ❹
/* #Optionally, add checks here for selective transmission of fault  */
/* entries that only match a certain criteris.                       */
/* For example:                                                      */
/* If ENV.USER_ID ¬= "FRED" then exit 0                              */
/* If ENV.USER_HFZHIST ¬= "MY.HISTFILE" the exit 0                   */
queue jobcard
queue '//**************************************************************'
queue '//* Export fault entry'
queue '//**************************************************************'
queue "//DD1      DD DISP=(,PASS),"
queue "//            SPACE=(CYL,(10,100,5),RLSE),"
queue "//            DCB=(DSORG=PO,RECFM=VB,LRECL=10000)"
queue "//SYSIN    DD *"
queue "/*"
queue '//**************************************************************'
queue '//* Terse the export data set'
queue '//**************************************************************'
queue "//SYSUT2   DD DISP=(,PASS),"
queue "//            SPACE=(CYL,(10,100),RLSE)"
queue '//**************************************************************'
queue '//* Perform TSO XMIT of the exported and tersed fault entry'
queue '//**************************************************************'
queue "//XMIT     EXEC PGM=IKJEFT01"
queue "//DD1      DD DISP=SHR,DSN=*.TERSE.SYSUT2"
queue "//SYSTSIN  DD *"
q_rec("  XMIT" nodeid"."userid "DDNAME(DD1) -")
q_rec("  NONOTIFY")
queue '/*'
/* 'Submit' the stacked TSO batch job */
n = queued()
if rc = 0 then do /* allocation worked so generate output */
  address mvs "EXECIO" n "DISKW DD1 (FINIS"
  say 'Fault entry' ENV.FAULT_ID 'sent to' nodeid'.'userid
else do                                                      
  "HFZWTO Allocation of INTRDR failed"                       
  say 'Fault entry' ENV.FAULT_ID 'job submission failure'   
exit 0

/* Pad record with blanks to 80 bytes.                               */
q_rec: procedure
parse arg rec
if (length(rec) < 80) then rec = rec||copies(' ',80-length(rec))
queue rec
return 0
'nodeid' specifies the target system to which the fault entry is sent.
'userid' specifies the user ID for which fault entries are received on the target system. Use this user ID solely to receive fault entries.
Ensure that the job card adheres to local standards.
You can add checks here to see whether a fault is eligible to be sent to another system. The example shows how the user ID or history file name can be used, but any fields in the ENV or NFY data areas can be checked.
The exit in Figure 2 can be called for any faults that occur on system A. To facilitate this exit, add these options to the HFZCNF00 config member, where exec.lib is your REXX EXEC PDS or PDSE data set:

HFZROBOT: sample REXX exec to receive fault entries

HFZROBOT is available in softcopy format as member HFZSROBT in data set HFZ.SHFZSAM1. This exec performs the following actions:
  1. Receive files for the HFZROBOT user into a staging data set from where they are imported into a local history file by using the HFZUTIL batch utility.
  2. Create the HFZUTIL IMPORT user exit, HFZROBEX (see Figure 1).
Figure 3. Sample TSO receive REXX exec (HFZROBOT) part 1
histfile = 'B.HIST'              /* <--- verify/change */ ❺
temphist = 'B.TEMP'              /* <--- verify/change */ ❻
seconds  = '60'                  /* <--- verify/change */ ❼
use_exit = 'Y'                   /* <--- Y|N. verify/change */  ❽
address tso
x = prompt('on')
x = outtrap('var.',10,'noconcat')
do forever
  /* Obtain information about transmitted data on the JES output queue */
  if queued() = 0 then queue 'end'
  input = 'N'

  /* Examine the output from the 'dummy' receive command.
     The following variables are initialized:
       dsn     - the 'sending' history file name
       fromid  - the user ID performing the TSO XMIT
       node    - the JES node from which the fault entry was sent
       faultid - the fault ID (member name) */
  do i = 1 to var.0
    parse var var.i msgno t1 t2 t3 t4 t5 t6
    if msgno = 'INMR901I' then do
      dsn = t2
      fromid = t4
      node = t6
    else if msgno = 'INMR902I' then do
      faultid = t2
      input = 'Y'
Figure 4. Sample TSO receive REXX exec (HFZROBOT) part 2
  /* Perform actual receive to the staging history file followed by an
     HFZUTIL batch utility import if there is data available */
  if input = 'Y' then do
    if faultid <> "" then do
      /* Receiving a PDS/E.                                          */

      say 'Receiving' dsn'('faultid') from' node'.'fromid
      queue "DSN('"temphist"')"
      queue 'END'
    else do
      /* Receiving a sequential data set - assume AMATERSE PACKed.   */
      say 'Receiving' dsn 'from' node'.'fromid
      queue "DSN('"temprecv"')"
      queue 'END'

      /* Perform AMATERSE UNPACK.                                    */
      "ALLOC DD(SYSUT1) DA('"temprecv"') SHR"
      "ALLOC DD(SYSUT2) DSN('"temphist"'),
             CYLINDERS SPACE(10,100) DIR(5)"
      address tso "CALL *(AMATERSE) 'UNPACK'"
      say 'UNPACK rc =' RC

      /* Get fault ID (member name).                                 */
      "LISTDS '"temphist"' MEMBERS"
      /* Sample output:                                              */
      /* FRED.$$TEMP$$.HIST                                          */
      /* --RECFM-LRECL-BLKSIZE-DSORG                                 */
      /*   VB    10000 27998   PO                                    */
      /* --VOLUMES--                                                 */
      /*   E$US21                                                    */
      /* --MEMBERS--                                                 */
      /*   F01103                                                    */
      mbr_start = 0
      do i = 1 to var.0
        /*say "var."i"='"var.i"'"*/
        if mbr_start = 0 then do
          if strip(var.i) = "--MEMBERS--" then do
            mbr_start = i + 1
      if mbr_start = var.0 then do
        /* One, and only one, member.                                */
        faultid = strip(var.mbr_start)
      else do
        say 'ERROR: More than one member found in data set' temphist,
            '- terminating'
        exit 12
      'FREE DD(SYSUT2)'
      'FREE DD(SYSUT1)'

      "DELETE '"temprecv"'"
Figure 5. Sample TSO receive REXX exec (HFZROBOT) part 3

    /* The target history file in the 'histfile' variable could be   */
    /* determined here based on any of the initialized variables     */
    /* dsn, fromid, node or faultid.  This sample EXEC uses a single */
    /* history file only.                                            */ ❾

    /* Perform HFZUTIL IMPORT.                                       */
    if use_exit = 'Y' then                                 
      parms.1  = "EXITS(IMPORT(REXX(HFZROBEX)))"           
      parms.1  = "* Using HFZOPTLM for dump data set names"
    parms.2  = "IMPORT("histfile","
    parms.3  = "  "temphist"("faultid"),PACKAGE)"
    parms.0  = 3
    address tso "CALL *(HFZUTIL)"
    say 'IMPORT rc =' RC
  else do
    /* Sleep for 60 seconds before attempting to receive again */
    address tso "call *(hfzsleep) '"seconds"'"
This item is the name of the target history file in which the received fault entries are placed. To select the history file that is based on where the fault originally occurred, see ❾.
This item is a staging data set that is used for the TSO receive command and from which fault entries are imported into the target history file.
Important: Do not use a preallocated data set. Let the exec allocate and delete this staging data set for each fault received, as shown in the sample provided.

For the HFZROBOT exec and HFZUTIL IMPORT processing to work, the staging data set must never be used as a regular history file and must never contain more than a single member. If it is used as a regular history file (for example, if it is displayed using the Fault Entry List display, or used as the target of an HFZUTIL FILES or LISTHF control statement), then a $$INDEX member will likely be created, which will cause the processing not to work. Also, it is possible that the data set becomes HFZS subsystem managed, which will subsequently result in serialization issues.

By ensuring that the staging data set only exists for the duration of the receive and IMPORT processing, the possibility that these issues will occur is eliminated.

The HFZROBOT exec enters a WAIT state to preserve resources between checking for fault entries to be received. The time interval in number of seconds between receiving fault entries can be specified here. All fault entries on the JES output queue for the chosen user ID are received, then the HFZROBOT exec enters the WAIT.
  • If you set the RFRDSN, XDUMPDSN, and SDUMPDSN options to valid data set name patterns in the HFZOPTLM configuration options load module, there is no need to use the HFZROBEX user exit. (See Customize Z Abend Investigator by using an HFZOPTLM configuration-options module.) In this case, set "use_exit" to 'N'.
  • If you use the HFZROBEX user exit, any dump data set names provided by the exit override the equivalent option setting in HFZOPTLM.
The sample exec uses only a single target history file for all received fault entries. It is possible to assign a target history file that is based on one of these items:
  • The original history file name (in variable 'dsn').
  • The sending user ID (in variable 'fromid').
  • The node ID from where it was sent (in variable 'node'),
  • The fault ID itself (in variable 'faultid').
Figure 6 shows a sample batch TSO job to execute the HFZROBOT exec. It is available in softcopy format as member HFZSTSOB in data set HFZ.SHFZSAM1.
Figure 6. Sample TSO batch job to execute HFZROBOT exec (HFZSTSOB)
//HFZSTSOB JOB  <job card parameters>                                
// SET EXECDSN=exec.lib                           <--- verify/change 
//TSOBATCH EXEC PGM=IKJEFT01                                         
//SYSEXEC  DD   DISP=SHR,DSN=&EXECDSN.                               
//SYSPRINT DD   SYSOUT=*                                             
//SYSTSPRT DD   SYSOUT=*                                             
//SYSIN    DD   DUMMY                                                
//SYSTSIN  DD   *                                                    
//HFZEXEC  DD DISP=SHR,DSN=*.TSOBATCH.SYSEXEC                        
Important: Ensure that the user ID under which the HFZROBOT exec is running (in this example, the submitter of the HFZSTSOB job) has update access to both the staging data set and all history files used as targets for imports.

Because the HFZROBOT exec never exits, the HFZSTSOB job executes indefinitely. However, the exec causes the job to enter a WAIT state between attempts to receive incoming data to prevent using unnecessary resources. To end the job, use the MVS CANCEL command during a period of inactivity. Alternatively, the exec could be made to recognize a special file that if sent to the selected user ID could trigger the exit to terminate.

A started task could be defined instead to execute this JCL in order to prevent tying up a JES initiator.