A facility has been constructed to re-reconstruct the PASS1 output on VMS. It was cloned from the MC Farm machinery for the VMS .COM and .TASK files in terms of how the supervisor and reconstruction jobs work. The job submission is quite different and is done with Rexx from UNIX
The operator selects the run range he wants re-reconstructed and then submits jobs for those runs to VMS from UNIX. The input (data or MC) datasets are staged to disk, the re-recon job uses those files as input and writes extended miniDST output to the staging-out areas.
Jobs send e-mail to sldpm@mailbox to record the outcome of the processing in Oracle tables, currently located on unix.
The same code is used for the re-recon as for the original PASS2. Hence all the event tagging is the same. Only the initialization and PASS2 code are needed from the PASS2 suite: R_VARINI.IDA and R_PASS2.IDA from $USR:[SLDPM.DP.IDA].
The processing jobs are run out of the SLDPM SLACVX cluster account; files are kept in the [SLDPM.RECON] directory.
As for the MC Farm, the re-recon is described by a .TASK file which identifies the re-recon by name; specifies a run range; IDA and SETUP files for defining the job environment; code version desired; and a comment. A typical .TASK file is shown:
Task: REC93V11 Rec_Ida: disk$sld_usr0:[sldpm]RECV11 Setup_file: disk$sld_usr0:[sldpm]REC93V11_SETUP Run_Period: 15774 23000 Code_Version: 12.0 Comments: Version 12 re-recon of '93 data
Bookkeeping is done via Oracle tables. SLD.RECON_TAPES keeps track of submission by task and tape, while SLD.RECON_RUNS expands that information to include status by run per tape.
SLD.RECON_TAPES Name Null? Type ------------------------------- -------- ---- TASK_NAME NOT NULL VARCHAR2(40) VOLSER NOT NULL VARCHAR2(6) TAPE_SUBMIT_STATUS VARCHAR2(1) TAPE_SUBMIT_DATE DATE TAPE_COMPLETION_STATUS VARCHAR2(20)
SLD.RECON_RUNS Name Null? Type ------------------------------- -------- ---- TASK_NAME NOT NULL VARCHAR2(40) VOLSER NOT NULL VARCHAR2(6) RUN NOT NULL NUMBER(9) RUN_SUBMIT_STATUS VARCHAR2(1) RUN_SUBMIT_DATE DATE RUN_COMPLETE_STATUS VARCHAR2(8) RUN_COMPLETE_DATE DATE
Here are the elements that make for a well-done task:
The '92 data pre-dates the current bookkeeping scheme in Oracle and has to be handled differently.
The output staging process maintains datacats in the [SLDPM] home directory. These datacats' names are based on the task name.
A task log file for the jobs run for a given task are stored in the [SLDPM] directory. Individual run log files are kept in $scr:[SLDPM], space permitting.
A difference between the MC Farm and the data re-recon operation is that the re-recon input files are densely packed on tape; often 30-40 runs at a time will fit on the 1 GB silo cartridges. Consequently some care must be taken to avoid having the recon jobs fight over the input tapes.
The current solution is to copy the input tape files to disk one tape at a time prior to a batch of runs being processed. When the recon jobs successfully complete, they erase their input file from disk. Currently the input disk pool is a 4 GB disk on SLACAX, disk$sld_rec_stg. When there is sufficient space on the disk, another batch of runs can be submitted.
Since a good deal of Oracle querying must be done, the job submission is done on a platform with easy Oracle access. UNIX has been chosen for its Rexx Oracle interface and ease of NFS access to VMS disks and vice versa. All the relevent files are kept in /u/ey/sldpm/recon/bin.
The version of Rexx on UNIX is called uni-Rexx. The version that contains the Oracle
access is in
/usr/local/bin/rxx. To connect to the unix Oracle DB one must set an
environment variable:
setenv TWO_TASK SLAC_TCP setenv REXXPATH /afs/slac.stanford.edu/g/sld/lib.shared:/afs/slac.stanford.edu/g/sld/bin')
Jobs can be submitted in two modes: by tape (ie do a tape's worth of files at a time) or by run range (intended for patchups of failed runs).
Use is made of the cron facility to regularly check the available space in the disk pool and submit another tape's worth of runs if there is room. This is intended to process the entire task without human intervention.
A file, 'task'.cron, tells cron what to run and at what interval. tcron.rex sets up the proper environment and calls tapesub.rex which looks at the SLD.RECON_TAPES Oracle table to see what the last submitted tape was and submits the next (if there is space for it). tapesub fills SLD.RECON_TAPES the first time through to list all the tapes containing the runs in the task. As each tapecopy job is submitted, it will update the table to keep track of what has been done. When all tapes are marked as done, the task is complete.
The method for submitting the crontab file is to prepare a file 'task'.cron which, for data, looks like
0,30 * * * * /u/ey/sldpm/recon/tcron.rex "rec95v12 ( data >> /u/ey/sldpm/recon/rec95v12/rec95v12.cronlog
and, for MC, looks like
0,30 * * * * /u/ey/sldpm/recon/tcron.rex "mc74_95r12_win_rc mc74_95r12_winter ( MC ROOT=MDST >> /u/ey/sldpm/recon/mc74_95r12_win_rc/mc74_95r12_win_rc.cronlog
where the task name appears twice. The MC is more complicated since it will take the input tapes from a different task (mc74_95r12_winter, in this case) and may use REC or MDST datasets as input.
To submit the task to crontab,
crontab 'task'.cron
Note that the cron job is specific to the node you submitted the crontab command on. Other nodes will not know about it.
Each time the timed job runs, it echoes into the file 'task'.cronlog. After the task is complete disable the cron job by
crontab -r
Recsub.rex
rxx recsub.rex 'task' run_begin run_end [OrTaskName]
( {data, MC} [ROOT=root] [NOCOPY VAX DEBUG]
to submit the individual runs to SLACVX. The NOCOPY option tells recsub that the input dataset is still on disk. The jobs are submitted by the tape copy job as each file gets copied to disk. The tape copy .COM file is placed in SLDPM's scratch area on SLACVX along with another .COM file, release'volser'.com, which has all the job submission commands. Note that if you use any of the options you'll need to enclose the entire argument string to recsub in single quotes (UNIX gets annoyed at the single open parenthesis. recsub.rex submits the jobs to the VAX cluster via rsh.
recsub.rex submits one HADCOPY.COM to batch on VMS for the entire set of runs in the
range. The HADCOPY job submits supervisor jobs each of which submits a recon job,
RECJOB.COM, to do the actual reconstruction, and then cleans up after successful
completion (it uses
[SLDMCM]CHECKLOG.COM to verify that the recon job finished properly).
The failure modes most commonly seen so far have been code crashes (almost entirely restricted to the Alphas) and running out of space on the output disk. The latter problem has been addressed by putting a test in the recon job GO loop requiring 100 blocks of free space before the next write (otherwise it waits 10' and tries again).
The solution to the former problem has been to resubmit the run to be analyzed on the VAX (using the VAX option to recsub). In the case of multiple jobs failed this way it has proven easy to create a fixup .COM file on UNIX to grab the submission commands, eg:
For run 'runnum',
cat /nfs/slacvx/sld_tmp0/scratch/sldpm/rel*.com | grep runnum >> fixup.com
and then resubmitting this file from the VAX (via NFS, eg @NFS$JUNO_U18:[EY.SLDPM.RECON]fixup.com), after adding the /CHAR=VAX string to the submission.
Monitoring the re-recon jobs is much like the MC Farm monitoring: one can use WWW or look at the task log files directly.
Again, just as for the MC Farm, the executable code is run from shareables and a frozen code area.
Experience has shown that there are a few likely failure modes:
Measures have been instituted to try to fend off the latter two modes.
As mentioned before, these are fixed by
resubmitting the offending job. If it is a random tape problem, the job can be resubmitted
to either architecture; if not, then it should be tried on the opposite one.
When running the farm full out, the log files can fill up the $scr disk quota in a
couple of days. For that reason, the log files have been moved to the MC output disk:
disk$sld_mc_stg:[sldpm.logs]. This space is managed by the stage supervisor and so should
not be a problem. Small files are
still written to $scr. These should not be a problem either.
Should the quota get exceeded anyways, jobs will fail like mad! Super and recon jobs will just fail and indicate that fact via mail. If, however, the $scr disk fills up, it is not always the case that the small temp files can be created to send the mail. Usually the final arbiter of success is whether the input dataset has been erased from disk$sld_rec_stg:[sld_data].
If this happened during a HADCOPY job, typically one must use SQLPLUS, eg
SQLPLUS> select run from sld.recon_runs where task_name = 'TASK_NAME'
and volser = 'tape';
to determine what runs were on that tape and resubmit them via recsub.rex, taking care not to include any runs not on that tape in the process.
Oracle is occasionally taken out of service. During these times, the job submission facility should not try to submit more jobs. The problem area in the past was in the receipt of mail from the completing jobs. If Oracle was down, the DB was never updated. A patch has been put in to the mail receiving script to wait up to 8 hours for Oracle to return before giving up.
Should this still happen, the mail is bounced back to SLDPM on the VMS side. When Oracle is back up, the mail can be forwarded back to UNIX. Or the DB can be updated manually.
Richard Dubois Updated 11/14/01 17:24