Data Re-recon on VMS

Introduction

A facility has been constructed to re-reconstruct the PASS1 output on VMS. It was cloned from the MC Farm machinery for the VMS .COM and .TASK files in terms of how the supervisor and reconstruction jobs work. The job submission is quite different and is done with Rexx from UNIX

The operator selects the run range he wants re-reconstructed and then submits jobs for those runs to VMS from UNIX. The input (data or MC) datasets are staged to disk, the re-recon job uses those files as input and writes extended miniDST output to the staging-out areas.

Jobs send e-mail to sldpm@mailbox to record the outcome of the processing in Oracle tables, currently located on unix.

Reconstruction Code

The same code is used for the re-recon as for the original PASS2. Hence all the event tagging is the same. Only the initialization and PASS2 code are needed from the PASS2 suite: R_VARINI.IDA and R_PASS2.IDA from $USR:[SLDPM.DP.IDA].

Bookkeeping

The processing jobs are run out of the SLDPM SLACVX cluster account; files are kept in the [SLDPM.RECON] directory.

Describing the Task

As for the MC Farm, the re-recon is described by a .TASK file which identifies the re-recon by name; specifies a run range; IDA and SETUP files for defining the job environment; code version desired; and a comment. A typical .TASK file is shown:

Task:		REC93V11
Rec_Ida:	disk$sld_usr0:[sldpm]RECV11
Setup_file:	disk$sld_usr0:[sldpm]REC93V11_SETUP
Run_Period:	15774 23000
Code_Version:	12.0
Comments:	Version 12 re-recon of '93 data

Bookkeeping is done via Oracle tables. SLD.RECON_TAPES keeps track of submission by task and tape, while SLD.RECON_RUNS expands that information to include status by run per tape.

SLD.RECON_TAPES

  Name                            Null?    Type
 ------------------------------- -------- ----
 TASK_NAME                       NOT NULL VARCHAR2(40)
 VOLSER                          NOT NULL VARCHAR2(6)
 TAPE_SUBMIT_STATUS                       VARCHAR2(1)
 TAPE_SUBMIT_DATE                         DATE
 TAPE_COMPLETION_STATUS                   VARCHAR2(20)

SLD.RECON_RUNS

 Name                            Null?    Type
 ------------------------------- -------- ----
 TASK_NAME                       NOT NULL VARCHAR2(40)
 VOLSER                          NOT NULL VARCHAR2(6)
 RUN                             NOT NULL NUMBER(9)
 RUN_SUBMIT_STATUS                        VARCHAR2(1)
 RUN_SUBMIT_DATE                          DATE
 RUN_COMPLETE_STATUS                      VARCHAR2(8)
 RUN_COMPLETE_DATE                        DATE

Recipe for a Delicious Re-Recon Task

Here are the elements that make for a well-done task:

Prepare a Task file, putting copies in sldpm's recon subdirectory on VMS and recon/taskname on UNIX.
Prepare a UNIX cron file for the task.
Initiate the task via trscrontab on UNIX.
Monitor the task's progress from WWW and with Rexx tools from ~sldpm/recon/bin.
Patch any failed jobs by hand.
Place copies of the dummy DONE. file into the task's REC and MDST output directories on the disk$sld_mc_stg disk:[sld_data.taskname_stage_XXX]
Run any jobs that were missed somehow..
Announce the availability of the minidst in DUCSNEWS folder SOFT.

The '92 data pre-dates the current bookkeeping scheme in Oracle and has to be handled differently.

The output staging process maintains datacats in the [SLDPM] home directory. These datacats' names are based on the task name.

A task log file for the jobs run for a given task are stored in the [SLDPM] directory. Individual run log files are kept in $scr:[SLDPM], space permitting.

Submitting Jobs

A difference between the MC Farm and the data re-recon operation is that the re-recon input files are densely packed on tape; often 30-40 runs at a time will fit on the 1 GB silo cartridges. Consequently some care must be taken to avoid having the recon jobs fight over the input tapes.

The current solution is to copy the input tape files to disk one tape at a time prior to a batch of runs being processed. When the recon jobs successfully complete, they erase their input file from disk. Currently the input disk pool is a 4 GB disk on SLACAX, disk$sld_rec_stg. When there is sufficient space on the disk, another batch of runs can be submitted.

Since a good deal of Oracle querying must be done, the job submission is done on a platform with easy Oracle access. UNIX has been chosen for its Rexx Oracle interface and ease of NFS access to VMS disks and vice versa. All the relevent files are kept in /u/ey/sldpm/recon/bin.

A Digression on Using Rexx in UNIX

The version of Rexx on UNIX is called uni-Rexx. The version that contains the Oracle access is in
/usr/local/bin/rxx. To connect to the unix Oracle DB one must set an environment variable:

setenv TWO_TASK  SLAC_TCP
setenv REXXPATH /afs/slac.stanford.edu/g/sld/lib.shared:/afs/slac.stanford.edu/g/sld/bin')

Back to Submitting Jobs

Jobs can be submitted in two modes: by tape (ie do a tape's worth of files at a time) or by run range (intended for patchups of failed runs).

On UNIX

Use is made of the cron facility to regularly check the available space in the disk pool and submit another tape's worth of runs if there is room. This is intended to process the entire task without human intervention.

A file, 'task'.cron, tells cron what to run and at what interval. tcron.rex sets up the proper environment and calls tapesub.rex which looks at the SLD.RECON_TAPES Oracle table to see what the last submitted tape was and submits the next (if there is space for it). tapesub fills SLD.RECON_TAPES the first time through to list all the tapes containing the runs in the task. As each tapecopy job is submitted, it will update the table to keep track of what has been done. When all tapes are marked as done, the task is complete.

The method for submitting the crontab file is to prepare a file 'task'.cron which, for data, looks like

0,30 * * * * /u/ey/sldpm/recon/tcron.rex "rec95v12 ( data >> /u/ey/sldpm/recon/rec95v12/rec95v12.cronlog

and, for MC, looks like

0,30 * * * * /u/ey/sldpm/recon/tcron.rex "mc74_95r12_win_rc mc74_95r12_winter  ( MC ROOT=MDST >> /u/ey/sldpm/recon/mc74_95r12_win_rc/mc74_95r12_win_rc.cronlog

where the task name appears twice. The MC is more complicated since it will take the input tapes from a different task (mc74_95r12_winter, in this case) and may use REC or MDST datasets as input.

To submit the task to crontab,

   crontab 'task'.cron

Note that the cron job is specific to the node you submitted the crontab command on. Other nodes will not know about it.

Each time the timed job runs, it echoes into the file 'task'.cronlog. After the task is complete disable the cron job by

crontab -r

Instructions for resubmitting failed jobs follows.

Recsub.rex
prepares a .COM file which will be submitted to SLACVX to perform the tape copy onto the buffer disk and then submit the individual run-by-run recon jobs. The .COM file is left around as
hadcopy'volser'.com (to make it unique)

rxx recsub.rex 'task' run_begin run_end [OrTaskName]
( {data, MC} [ROOT=root] [NOCOPY VAX DEBUG]

to submit the individual runs to SLACVX. The NOCOPY option tells recsub that the input dataset is still on disk. The jobs are submitted by the tape copy job as each file gets copied to disk. The tape copy .COM file is placed in SLDPM's scratch area on SLACVX along with another .COM file, release'volser'.com, which has all the job submission commands. Note that if you use any of the options you'll need to enclose the entire argument string to recsub in single quotes (UNIX gets annoyed at the single open parenthesis. recsub.rex submits the jobs to the VAX cluster via rsh.

On VMS

recsub.rex submits one HADCOPY.COM to batch on VMS for the entire set of runs in the range. The HADCOPY job submits supervisor jobs each of which submits a recon job, RECJOB.COM, to do the actual reconstruction, and then cleans up after successful completion (it uses
[SLDMCM]CHECKLOG.COM to verify that the recon job finished properly).

The failure modes most commonly seen so far have been code crashes (almost entirely restricted to the Alphas) and running out of space on the output disk. The latter problem has been addressed by putting a test in the recon job GO loop requiring 100 blocks of free space before the next write (otherwise it waits 10' and tries again).

The solution to the former problem has been to resubmit the run to be analyzed on the VAX (using the VAX option to recsub). In the case of multiple jobs failed this way it has proven easy to create a fixup .COM file on UNIX to grab the submission commands, eg:

For run 'runnum',

 cat /nfs/slacvx/sld_tmp0/scratch/sldpm/rel*.com | grep runnum >> fixup.com

and then resubmitting this file from the VAX (via NFS, eg @NFS$JUNO_U18:[EY.SLDPM.RECON]fixup.com), after adding the /CHAR=VAX string to the submission.

Monitoring the Jobs

Monitoring the re-recon jobs is much like the 

MC Farm monitoring: one can use WWW or look at the task log files
directly.

Shareables and Frozen Production Areas

Again, just as for the MC Farm, the executable code is run from shareables and a frozen code area.

Troubleshooting or What Could Go Wrong?

Experience has shown that there are a few likely failure modes:

Measures have been instituted to try to fend off the latter two modes.

Job Crashes

As mentioned before, these are fixed by resubmitting the offending job. If it is a random tape problem, the job can be resubmitted to either architecture; if not, then it should be tried on the opposite one.

Disk Quotas Exceeded

When running the farm full out, the log files can fill up the $scr disk quota in a couple of days. For that reason, the log files have been moved to the MC output disk: disk$sld_mc_stg:[sldpm.logs]. This space is managed by the stage supervisor and so should not be a problem. Small files are
still written to $scr. These should not be a problem either.

Should the quota get exceeded anyways, jobs will fail like mad! Super and recon jobs will just fail and indicate that fact via mail. If, however, the $scr disk fills up, it is not always the case that the small temp files can be created to send the mail. Usually the final arbiter of success is whether the input dataset has been erased from disk$sld_rec_stg:[sld_data].

If this happened during a HADCOPY job, typically one must use SQLPLUS, eg

 SQLPLUS> select run from sld.recon_runs where task_name = 'TASK_NAME'
                  and volser = 'tape';

to determine what runs were on that tape and resubmit them via recsub.rex, taking care not to include any runs not on that tape in the process.

Oracle Unavailable

Oracle is occasionally taken out of service. During these times, the job submission facility should not try to submit more jobs. The problem area in the past was in the receipt of mail from the completing jobs. If Oracle was down, the DB was never updated. A patch has been put in to the mail receiving script to wait up to 8 hours for Oracle to return before giving up.

Should this still happen, the mail is bounced back to SLDPM on the VMS side. When Oracle is back up, the mail can be forwarded back to UNIX. Or the DB can be updated manually.

Richard Dubois Updated 11/14/01 17:24