diff --git a/CITATION b/CITATION index 95e5df9..3c2fc55 100644 --- a/CITATION +++ b/CITATION @@ -1,3 +1,3 @@ Please cite as: -Dune Collaboration: "DUNE Computing Tutorial" Version 2024.01 +Dune Collaboration: "DUNE Computing Tutorial" Version 2025.01 diff --git a/_episodes/02-submit-jobs-w-justin.md b/_episodes/02-submit-jobs-w-justin.md index a31fe63..5227522 100644 --- a/_episodes/02-submit-jobs-w-justin.md +++ b/_episodes/02-submit-jobs-w-justin.md @@ -1,27 +1,27 @@ --- -title: Submit grid jobs with JustIn +title: New justIN Job Submission System teaching: 20 exercises: 0 questions: -- How to submit realistic grid jobs with JustIn +- How to submit realistic grid jobs with justIN objectives: -- Demonstrate use of [justIn](https://dunejustin.fnal.gov) for job submission with more complicated setups. +- Demonstrate use of [justIN](https://dunejustin.fnal.gov) for job submission with more complicated setups. keypoints: - Always, always, always prestage input datasets. No exceptions. --- -# PLEASE USE THE NEW [justIn](https://dunejustin.fnal.gov) SYSTEM INSTEAD OF POMS +# PLEASE USE THE NEW [justIN](https://dunejustin.fnal.gov) SYSTEM INSTEAD OF POMS -__A simple [justIn](https://dunejustin.fnal.gov) Tutorial is currently in docdb at: [JustIn Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=30145)__ +__A simple [justIN](https://dunejustin.fnal.gov) Tutorial is currently in docdb at: [justIN Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=30145)__ A more detailed tutorial is available at: -[JustIn Docs](https://dunejustin.fnal.gov/docs/) +[justIN Docs](https://dunejustin.fnal.gov/docs/) -The [justIn](https://dunejustin.fnal.gov) system is described in detail at: +The [justIN](https://dunejustin.fnal.gov) system is described in detail at: -__[JustIn Home](https://dunejustin.fnal.gov/dashboard/)__ +__[justIN Home](https://dunejustIN .fnal.gov/dashboard/)__ -__[JustIn Docs](https://dunejustin.fnal.gov/docs/)__ +__[justIN Docs](https://dunejustin.fnal.gov/docs/)__ > ## Note More documentation coming soon diff --git a/_episodes/07-grid-job-submission.md b/_episodes/07-grid-job-submission.md index 51e4f96..ea93a3b 100644 --- a/_episodes/07-grid-job-submission.md +++ b/_episodes/07-grid-job-submission.md @@ -1,5 +1,5 @@ --- -title: Jobsub Grid Job Submission and Common Errors - still 2024 version +title: Jobsub Grid Job Submission and Common Errors (SPECIAL PURPOSE) teaching: 65 exercises: 0 questions: @@ -68,8 +68,8 @@ The past few months have seen significant changes in how DUNE (as well as other First, log in to a `dunegpvm` machine . Then you will need to set up the job submission tools (`jobsub`). If you set up `dunesw` it will be included, but if not, you need to do ~~~ -mkdir -p /pnfs/dune/scratch/users/${USER}/DUNE_tutorial_sep2025 # if you have not done this before -mkdir -p /pnfs/dune/scratch/users/${USER}/sep2025tutorial +mkdir -p /pnfs/dune/scratch/users/${USER}/DUNE_tutorial_jan2026 # if you have not done this before +mkdir -p /pnfs/dune/scratch/users/${USER}/jan2026tutorial ~~~ {: ..language-bash} @@ -190,8 +190,8 @@ You will have to change the last line with your own submit file instead of the p First, we should make a tarball. Here is what we can do (assuming you are starting from /exp/dune/app/users/username/): ```bash -cp /exp/dune/app/users/kherner/setupsep2025tutorial-grid.sh /exp/dune/app/users/${USER}/ -cp /exp/dune/app/users/kherner/sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid /exp/dune/app/users/${USER}/sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid +cp /exp/dune/app/users/kherner/setupjan2026tutorial-grid.sh /exp/dune/app/users/${USER}/ +cp /exp/dune/app/users/kherner/jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid /exp/dune/app/users/${USER}/jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid ``` Before we continue, let's examine these files a bit. We will source the first one in our job script, and it will set up the environment for us. @@ -199,7 +199,7 @@ Before we continue, let's examine these files a bit. We will source the first on ~~~ #!/bin/bash -DIRECTORY=sep2025tutorial +DIRECTORY=jan2026tutorial # we cannot rely on "whoami" in a grid job. We have no idea what the local username will be. # Use the GRID_USER environment variable instead (set automatically by jobsub). USERNAME=${GRID_USER} @@ -217,40 +217,38 @@ mrbslp Now let's look at the difference between the setup-grid script and the plain setup script. -Assuming you are currently in the /exp/dune/app/users/username directory: +Assuming you are currently in the `/exp/dune/app/users/$USER` directory: ```bash -diff sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid +diff jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof/setup-grid ``` ~~~ -< setenv MRB_TOP "/exp/dune/app/users//sep2025tutorial" -< setenv MRB_TOP_BUILD "/exp/dune/app/users//sep2025tutorial" -< setenv MRB_SOURCE "/exp/dune/app/users//sep2025tutorial/srcs" -< setenv MRB_INSTALL "/exp/dune/app/users//sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof" +< setenv MRB_TOP "/exp/dune/app/users//jan2026tutorial" +< setenv MRB_TOP_BUILD "/exp/dune/app/users//jan2026tutorial" +< setenv MRB_SOURCE "/exp/dune/app/users//jan2026tutorial/srcs" +< setenv MRB_INSTALL "/exp/dune/app/users//jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof" --- -> setenv MRB_TOP "${INPUT_TAR_DIR_LOCAL}/sep2025tutorial" -> setenv MRB_TOP_BUILD "${INPUT_TAR_DIR_LOCAL}/sep2025tutorial" -> setenv MRB_SOURCE "${INPUT_TAR_DIR_LOCAL}/sep2025tutorial/srcs" -> setenv MRB_INSTALL "${INPUT_TAR_DIR_LOCAL}/sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof" +> setenv MRB_TOP "${INPUT_TAR_DIR_LOCAL}/jan2026tutorial" +> setenv MRB_TOP_BUILD "${INPUT_TAR_DIR_LOCAL}/jan2026tutorial" +> setenv MRB_SOURCE "${INPUT_TAR_DIR_LOCAL}/jan2026tutorial/srcs" +> setenv MRB_INSTALL "${INPUT_TAR_DIR_LOCAL}/jan2026tutorial/localProducts_larsoft_v09_72_01_e20_prof" ~~~ As you can see, we have switched from the hard-coded directories to directories defined by environment variables; the `INPUT_TAR_DIR_LOCAL` variable will be set for us (see below). -Now, let's actually create our tar file. Again assuming you are in `/exp/dune/app/users/kherner/sep2025tutorial/`: +Now, let's actually create our tar file. Again assuming you are in `/exp/dune/app/users/kherner/jan2026tutorial/`: ```bash -tar --exclude '.git' -czf sep2025tutorial.tar.gz sep2025tutorial/localProducts_larsoft_v09_72_01_e20_prof sep2025tutorial/work setupsep2025tutorial-grid.sh +tar --exclude '.git' -czf jan2026tutorial.tar.gz jan2026tutorial/localProducts_larsoft_${DUNESW_VERSION}_${DUNESW_QUALIFIER} jan2026tutorial/work setupjan2026tutorial-grid.sh ``` Note how we have excluded the contents of ".git" directories in the various packages, since we don't need any of that in our jobs. It turns out that the .git directory can sometimes account for a substantial fraction of a package's size on disk! Then submit another job (in the following we keep the same submit file as above): -```bash -jobsub_submit -G dune --mail_always -N 1 --memory=2500MB --disk=2GB --expected-lifetime=3h --cpu=1 --tar_file_name=dropbox:///exp/dune/app/users//sep2025tutorial.tar.gz --singularity-image /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl7:latest --append_condor_requirements='(TARGET.HAS_Singularity==true&&TARGET.HAS_CVMFS_dune_opensciencegrid_org==true&&TARGET.HAS_CVMFS_larsoft_opensciencegrid_org==true&&TARGET.CVMFS_dune_opensciencegrid_org_REVISION>=1105&&TARGET.HAS_CVMFS_fifeuser1_opensciencegrid_org==true&&TARGET.HAS_CVMFS_fifeuser2_opensciencegrid_org==true&&TARGET.HAS_CVMFS_fifeuser3_opensciencegrid_org==true&&TARGET.HAS_CVMFS_fifeuser4_opensciencegrid_org==true)' -e GFAL_PLUGIN_DIR=/usr/lib64/gfal2-plugins -e GFAL_CONFIG_DIR=/etc/gfal2.d file:///exp/dune/app/users/kherner/run_sep2025tutorial.sh -``` + You'll see this is very similar to the previous case, but there are some new options: -* `--tar_file_name=dropbox://` automatically **copies and untars** the given tarball into a directory on the worker node, accessed via the INPUT_TAR_DIR_LOCAL environment variable in the job. The value of INPUT_TAR_DIR_LOCAL is by default $CONDOR_DIR_INPUT/name_of_tar_file_without_extension, so if you have a tar file named e.g. sep2025tutorial.tar.gz, it would be $CONDOR_DIR_INPUT/sep2025tutorial. +* `--tar_file_name=dropbox://` automatically **copies and untars** the given tarball into a directory on the worker node, accessed via the INPUT_TAR_DIR_LOCAL environment variable in the job. The value of INPUT_TAR_DIR_LOCAL is by default $CONDOR_DIR_INPUT/name_of_tar_file_without_extension, so if you have a tar file named e.g. jan2026tutorial.tar.gz, it would be $CONDOR_DIR_INPUT/jan2026tutorial. * Notice that the `--append_condor_requirements` line is longer now, because we also check for the fifeuser[1-4]. opensciencegrid.org CVMFS repositories. The submission output will look something like this: @@ -265,7 +263,7 @@ Could not locate uploaded file on RCDS. Will retry in 30 seconds. Could not locate uploaded file on RCDS. Will retry in 30 seconds. Found uploaded file on RCDS. Transferring files to web sandbox... -Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/run_sep2025tutorial.sh [DONE] after 0s +Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/run_jan2026tutorial.sh [DONE] after 0s Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/simple.cmd [DONE] after 0s Copying file:///nashome/k/kherner/.cache/jobsub_lite/js_2023_05_24_224713_9669e535-daf9-496f-8332-c6ec8a4238d9/simple.sh [DONE] after 0s Submitting job(s). @@ -566,8 +564,6 @@ Some more background material on these topics (including some examples of why ce [Wiki page listing differences between jobsub_lite and legacy jobsub](https://fifewiki.fnal.gov/wiki/Differences_between_jobsub_lite_and_legacy_jobsub_client/server) -[DUNE Computing Tutorial:Advanced topics and best practices](DUNE_computing_tutorial_advanced_topics_20210129) - [2021 Intensity Frontier Summer School](https://indico.fnal.gov/event/49414) [The Glidein-based Workflow Management System]( https://glideinwms.fnal.gov/doc.prd/index.html ) diff --git a/_episodes/08-justin-job-submission.md b/_episodes/08-justin-job-submission.md new file mode 100644 index 0000000..9985b5d --- /dev/null +++ b/_episodes/08-justin-job-submission.md @@ -0,0 +1,98 @@ +--- +title: justIN Grid Job Submission (UNDER CONSTRUCTION) +teaching: 65 +exercises: 0 +questions: +- How to submit grid jobs? +objectives: +- Submit a basic batchjob and understand what's happening behind the scenes +- Monitor the job and look at its outputs +- Review best practices for submitting jobs (including what NOT to do) +keypoints: +- When in doubt, ask! Understand that policies and procedures that seem annoying, overly complicated, or unnecessary (especially when compared to running an interactive test) are there to ensure efficient operation and scalability. They are also often the result of someone breaking something in the past, or of simpler approaches not scaling well. +- Send test jobs after creating new workflows or making changes to existing ones. If things don't work, don't blindly resubmit and expect things to magically work the next time. +- Only copy what you need in input tar files. In particular, avoid copying log files, .git directories, temporary files, etc. from interactive areas. +- Take care to follow best practices when setting up input and output file locations. +- Always, always, always prestage input datasets. No exceptions. +--- + + + +The video from the two day version of this training in May 2022 is provided [here](https://www.youtube.com/embed/QuDxkhq64Og) as a reference. --> + + + + + + + +Once you have practiced basic justIn commands, please look at the instructions for running your own code below: + + + +## First learn the basics of Justin Submit a job + +Go to [The justIN Tutorial](https://dunejustin.fnal.gov/docs/tutorials.dune.md) + +and work up to ["run some hello world jobs"](https://dunejustin.fnal.gov/docs/tutorials.dune.md#run-some-hello-world-jobs) + +> ## Quiz +> +> 1. What is your workflow ID? +> +{: .solution} + +Then work through + +- [View your workflow on the justIN web dashboard](https://dunejustin.fnal.gov/docs/tutorials.dune.md#view-your-workflow-on-the-justin-web-dashboard) +- [Jobs with inputs and outputs](https://dunejustin.fnal.gov/docs/tutorials.dune.md#jobs-with-inputs-and-outputs) +- [Fetching files from Rucio managed storage](https://dunejustin.fnal.gov/docs/tutorials.dune.md#fetching-files-from-rucio-managed-storage) +- (skip for now) Jobs using GPUs +- [Jobs writing to scratch](https://dunejustin.fnal.gov/docs/tutorials.dune.md#jobs-writing-to-scratch) + + + + + +## Submit a job using the tarball containing custom code + + + +First off, a very important point: for running analysis jobs, **you may not actually need to pass an input tarball**, especially if you are just using code from the base release and you don't actually modify any of it. In that case, it is much more efficient to use everything from the release and refrain from using a tarball. +All you need to do is set up any required software from CVMFS (e.g. dunetpc and/or protoduneana), and you are ready to go. +If you're just modifying a fcl file, for example, but no code, it's actually more efficient to copy just the fcl(s) you're changing to the scratch directory within the job, and edit them as part of your job script (copies of a fcl file in the current working directory have priority over others by default). + +Sometimes, though, we need to run some custom code that isn't in a release. +We need a way to efficiently get code into jobs without overwhelming our data transfer systems. +We have to make a few minor changes to the scripts you made in the previous tutorial section, generate a tarball, and invoke the proper jobsub options to get that into your job. +There are many ways of doing this but by far the best is to use the Rapid Code Distribution Service (RCDS), as shown in our example. + + +### Temporary short version of an example for custom code. + +We're working on a long version of this but please look at these [instructions for running a justIN workflow using your own code]({{ site.baseurl }}/short_submission) for now. + +### Cool justIN feature + +justIN has a very useful interactive test command. + +Here is a test from the short submission example. + +~~~ +{% include test_workflow.sh %} +~~~ + +it reads in a tarball from an area `$DUNEDATA` and writes output to a tmp area on your interactive machine. It works very well at emulating a grid job. + +## Did your job work? + +If not please ask over at #computing-questions in Slack \ No newline at end of file diff --git a/_extras/short_submission.md b/_extras/short_submission.md new file mode 100644 index 0000000..666adfa --- /dev/null +++ b/_extras/short_submission.md @@ -0,0 +1,163 @@ +--- +title: Short submission with your own code +--- + +## this collects the sequence of steps for a batch submission with local code which produces both an artroot and root file. + +It splits out different functions so you need to examine/adapt all of the support scripts which can be found [here](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/files/usefulcode.tar). + +This sequence assumes you are in your top level mrb directory. + +### in your top level mrb directory + + +For example `/exp/dune/app/users/$USER/myworkarea` + +need to have a name for it as you will be making a tarball + +~~~ +export DIRECTORY=myworkarea +~~~ +{: ..language-bash} + +### copy these scripts into that top level directory + +You can access a tarball with them all [here](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/files/usefulcode.tar). + +Download that tarball into the top level directory for your build and + +~~~ +tar xBf usefulcode.tar +~~~ +{: ..language-bash} + + to get the code. + +### here are the scripts.. + +#### utilities you need + +- [setup-grid](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/setup-grid) (should not need to modify) +This replaces `setup` in your local_build directory. + +- [maketar.sh $DIRECTORY](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/maketar.sh) +(should not need to modify) +This takes the contents of `$DIRECTORY` and makes a tarball in `/exp/data/users/$USER/` + +- [makerdcs.sh $DIRECTORY](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/makerdcs.sh) (should not need to modify) +This takes the tarball and copies it to /cvmfs/ where grid jobs can find it. It places the location in the file `$DIRECTORY/cvmfs.location` so you can find it. + +#### setup scripts + +- [setup_before_submit.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/setup_before_submit.sh) (customize versions for your code) +You need to modify this to reflect the code version you are setting up. Normally only need to run/modify this once/session. + + +- [job_config.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/job_config.sh) (you modify this to reflect your workflow. Sets things like $FCL_FILE). This sets up essential job parameters. You need to understand and modify these appropriately for your purpose. + +#### Script to test and submit jobs + +- [test_workflow.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/test_workflow.sh) (script to do interactive tests of your jobscript) + +- [submit_workflow.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/submit_workflow.sh) (writes output to scratch. Modify running time and memory) + +- [submit_workflow_rucio.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/submit_workflow_rucio.sh) (writes output to rucio. Modify running time and memory) + + +#### scripts that run on the remote machine + +[extractor_new.py](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/extractor_new.py) (this makes metadata for your files) + +[submit_local_code.jobscript.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/submit_local_code.jobscript.sh) (may need to modify if expert) + + + +## How to run these scripts + +### modify two-three scripts (should not need to change the others) + +edit + +- `setup_before_submit.sh` if you change code versions and +- `job_config.sh` if you change more temporary things like fcl files. + +- choose your code version (code version has to match your build) +- *make certain the fcl file is either in the fcl path or in `$DIRECTORY`* +- add a string `APP_TAG` that will go in your filename +- add a description in `DESCRIPTION` + +Then run those scripts to set things up + +~~~ +source setup_before_submit.sh # sets up larsoft +~~~ + +### from $DIRECTORY make a tarball and put in rcds + +If you have changed any scripts or code, you must redo this. + +~~~ +./maketar.sh $DIRECTORY +./makercds.sh $DIRECTORY +~~~ +{: ..language-bash} + +will take a while, produce a tarball on `/exp/dune/data/users/$USER/` and put the cvmfs location in cvmfs.location in `$DIRECTORY` + +### Configure your job with job_config.sh + +Then edit `job_config.sh` to reflect the # of events you want and other run-time parameters. + +#### Details of job_config.sh + +Here is what is in job_config.sh +~~~ +{% include job_config.sh %} +~~~ + +- `FCL_FILE`= the top level fcl file - assumes it is in DIRECTORY or in the `PHICL_FILE_PATH` +- `OUTPUT_DATA_TIER1` data_tier for artroot output (full-reconstructed, ...) +- `OUTPUT_DATA_TIER2` data_tier for root output - normally plain root-tuple +- `MQL` Metacat query that you wish to run over +- `APP_TAG` this is a tag that goes into the output filename, like reco2, ana ... +- `DESCRIPTION` the jobname that shows up in justIN +- `USERF` make certain the grid knows who your are without overwriting whatever internal `USER` it has +- `NUM_EVENTS` the `-n` argument of larsoft +- `FNALURL` sends output to subdirectories of your area on scratch +- `NAMESPACE` rucio/metacat namespace for your output, normally `usertests` or possibly your username or physics group unless you are doing production. + + +### Test your jobscript interactively + +[test_workflow.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/test_workflow.sh) + +results will show up in the 'tmp' area on your local machine. + + +### Submit the job + +[submit_workflow.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/submit_workflow.sh) + +This one writes to `/pnfs/dune/scratch` + +~~~ +./submit_workflow.sh +~~~ +{: ..language-bash} + +[submit_workflow_rucio.sh](https://github.com/hschellman/computing-basics-batch-devel/blob/gh-pages/_includes/submit_workflow_rucio.sh) + +This one writes to a rucio location specified by `$NAMESPACE` + +~~~ +./submit_workflow_rucio.sh +~~~ +{: ..language-bash} + +### after submission + +You should get a workflow number back + +go to [justin](https://dunejustin.fnal.gov/dashboard/?method=list-workflows) + +to track your job. diff --git a/_includes/DUNEmdSpec.json b/_includes/DUNEmdSpec.json new file mode 100644 index 0000000..bb6ed4f --- /dev/null +++ b/_includes/DUNEmdSpec.json @@ -0,0 +1,123 @@ + +{ "known_fields": { + "core.run_type":[ + "fardet", + "neardet", + "protodune", + "protodune-sp", + "protodune-dp", + "35ton", + "311", + "311_dp_light", + "iceberg", + "fardet-sp", + "fardet-dp", + "fardet-moo", + "np04_vst", + "vd-coldbox-bottom", + "vd-coldbox-top", + "protodune-hd", + "hd-coldbox", + "vd-protodune-arapucas", + "protodune-vst", + "vd-protodune-pds", + "fardet-hd", + "fardet-vd", + "dc4-vd-coldbox-bottom", + "dc4-vd-coldbox-top", + "dc4-hd-protodune", + "hd-protodune", + "neardet-lar", + "neardet-2x2-minerva", + "neardet-2x2-lar-charge", + "neardet-2x2-lar-light", + "neardet-2x2", + "neardet-2x2-lar", + "vd-protodune", + "vd-coldbox" + ], + "core.file_type":[ + "detector", + "mc", + "importedDetector" + ], + "core.data_tier":[ + "simulated", + "raw", + "hit-reconstructed", + "full-reconstructed", + "generated", + "detector-simulated", + "reconstructed-2d", + "reconstructed-3d", + "sliced", + "dc1input", + "dc1output", + "root-tuple", + "root-hist", + "dqm", + "decoded-raw", + "sam-user", + "pandora_info", + "reco-recalibrated", + "storage-testing", + "root-tuple-virtual", + "binary-raw", + "trigprim", + "pandora-info" + ], + "core.data_stream":[ + "out1", + "noise", + "test", + "cosmics", + "calibration", + "physics", + "commissioning", + "out2", + "pedestal", + "study", + "trigprim", + "pdstl", + "linjc", + "numib", + "numip", + "numil" + ] + }, + "basetypes":{ + "name": "STRING", + "namespace": "STRING", + "size":"INT", + "metadata": { + "core.application.family": "STRING", + "core.application.name": "STRING", + "core.application.version": "STRING", + "core.data_stream":"STRING", + "core.data_tier": "STRING", + "core.end_time": "FLOAT", + "core.event_count": "INT", + "core.events": "LIST", + "core.file_content_status": "STRING", + "core.file_format": "STRING", + "core.file_type": "STRING", + "core.first_event_number": "INT", + "core.last_event_number": "INT", + "core.run_type": "STRING", + "core.runs": "LIST", + "core.runs_subruns": "LIST", + "core.start_time": "FLOAT", + "dune.daq_test": "STRING", + "dune.config_file": "STRING", + "dune_mc.gen_fcl_filename": "STRING", + "dune_mc.geometry_version":"STRING", + "retention.status": "STRING", + "retention.class": "STRING" + } + }, + "fixDefaults":{ + "core.file_content_status": "good", + "retention.status": "active", + "retention.class": "unknown" + } +} \ No newline at end of file diff --git a/_includes/MDValidator.py b/_includes/MDValidator.py new file mode 100644 index 0000000..96125e8 --- /dev/null +++ b/_includes/MDValidator.py @@ -0,0 +1,158 @@ +"""Check metadata against a template""" +import os,sys,json + +DEBUG=False + +def TypeChecker(filemd=None, errfile=None, verbose=False): + " check for type and missing required fields in metadata" + + # define types + valuetypes = { + "STRING" : type(""), + "FLOAT" : type(1.0), + "INT" : type(1), + "LIST" : type([]), + "DICT" : type({}), + } + + # read in the defaults + + f = open("DUNEmdSpec.json",'r') + config = json.load(f) + f.close() + + # list defaults for metadata fields + + + + # set default values for fields that are often missing but needed + fixDefaults = { + "core.file_content_status":"good", + "retention.status":"active", + "retention.class":"unknown" + } + + # place to put optional fields: all is optional for all, otherwise you need to tell it data_tier + + optional = { + "all":["core.events","dune.daq_test"], + "root-tuple":["core.event_count","core.first_event_number","core.last_event_number"], + "raw":["dune.config_file", "dune_mc.gen_fcl_filename","dune_mc.geometry_version","core.application.family","core.application.name","core.application.version"], + "binary-raw":["dune.config_file", "dune_mc.gen_fcl_filename","dune_mc.geometry_version","core.application.family","core.application.name","core.application.version"], + "trigprim":["dune.config_file", "dune_mc.gen_fcl_filename","dune_mc.geometry_version","core.application.family","core.application.name","core.application.version"], + "root-tuple-virtual":["core.event_count","core.first_event_number","core.last_event_number"] + } + + + did = filemd["namespace"]+":"+filemd["name"] + + # do this as file may not have an fid yet, but fid makes shorter error messages. + if "fid" in filemd: + fid = filemd["fid"] + else: + fid = did + + # start out with valid and no fixes needed + valid = True + fixes = {} + + # loop over default md keys + + for x, xtype in config["basetypes"].items(): + if DEBUG: print (x,xtype) + if x in optional["all"]: continue + # check required + if x not in filemd.keys(): + error = x+" is missing from "+ fid + "\n" + print (error) + if errfile is not None: errfile.write(error) + valid *= False + print (filemd.keys()) + + # check type + if x != "metadata" and valuetypes[xtype] != type(filemd[x]) : + #print (x,xtype) + if xtype == valuetypes["FLOAT"] and type(filemd[x]) == valuetypes["INT"]: continue + error = "top level item %s has wrong type in %s \n"%(x,fid) + print (error) + if errfile is not None: errfile.write(error) + valid *= False + + # now do the metadata + if DEBUG: + print ("keys",filemd.keys()) + + if "metadata" not in filemd.keys(): + print ("strange - no metadata for this file") + + md = filemd["metadata"] + + for x, xtype in config["basetypes"]["metadata"].items(): + if DEBUG: print ("checking", x,xtype) + if x in optional["all"]: continue # skip optional items + if "core.run_type" in md and md["core.run_type"] != "mc" and "mc" in x: + if verbose: print ("skipping mc only",x) + continue + + # check required keys + if x not in md.keys(): + if "core.data_tier" in md and md["core.data_tier"] in optional and x in optional[md["core.data_tier"]]: # skip optional items by data_tier + + if verbose: print ("skipping optional missing field for data_tier",md["core.data_tier"],x) + continue + error = x+ " is missing from " + fid + "\n" + print (error) + if errfile is not None: errfile.write(error) + valid *= False + if x in fixDefaults: + fixes[x]=fixDefaults[x] + continue + # check for type + if DEBUG: print ("xtype",xtype) + if valuetypes[xtype] != type(md[x]): + if xtype == "FLOAT" and type(md[x]) == valuetypes["INT"]: continue + error = "%s has wrong type in %s\n "%(x,fid) + print (error) + if errfile is not None: errfile.write(error+"\n") + valid *= False + for x,core in config["known_fields"].items(): + if x not in md: + print ("required field",x,"not present") + valid *=False + continue + if md[x] not in core: + print ("unknown required metadata field",x,"=",md[x]) + valid *= False + if not valid: + print (did, " fails basic metadata tests") + if len(fixes) !=0: + print ("you could fix this by applying this fix") + print (json.dumps(fixes,indent=4)) + + # look for upper case in keys + + for x,v in md.items(): + if x != x.lower(): + print ("OOPS upper case",x) + + + + + + return valid, fixes + + +if __name__ == '__main__': + + if len(sys.argv) < 2: + print ("please provide a json file to check") + sys.exit(1) + jsonname = sys.argv[1] + if not os.path.exists(jsonname): + print ("input file does not exist",jsonname) + sys.exit(1) + jsonfile = open(jsonname,'r') + filemd = json.load(jsonfile) + errfile = open(jsonname+".err",'w') + status,fixes = TypeChecker(filemd=filemd,errfile=errfile,verbose=True) + errfile.close() \ No newline at end of file diff --git a/_includes/extractor_new.py b/_includes/extractor_new.py new file mode 100755 index 0000000..d338446 --- /dev/null +++ b/_includes/extractor_new.py @@ -0,0 +1,502 @@ +#!/usr/bin/env python +import sys, getopt +import os +from subprocess import Popen, PIPE +import threading +import queue +import json +import abc +import datetime + +DEBUG=False + +import argparse + +from metacat.webapi import MetaCatClient + +mc_client = MetaCatClient(os.getenv("METACAT_SERVER_URL")) + + +# Function to wait for a subprocess to finish and fetch return code, +# standard output, and standard error. +# Call this function like this: +# +# q = Queue.Queue() +# jobinfo = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) +# wait_for_subprocess(jobinfo, q) +# rc = q.get() # Return code. +# jobout = q.get() # Standard output +# joberr = q.get() # Standard error + +"""extractor_new.py +Purpose: To extract metadata from output file on worker node, generate JSON file +""" + + +class MetaData(object): + """Base class to hold / interpret general metadata""" + __metaclass__ = abc.ABCMeta + + @abc.abstractmethod + def __init__(self, inputfile): + self.inputfile = inputfile + + def extract_metadata_to_pipe(self): + """Extract metadata from inputfile into a pipe for further processing.""" + local = self.inputfile + if len(local) > 0: + proc = Popen(["sam_metadata_dumper", local], stdout=PIPE, + stderr=PIPE) + else: + url = self.inputfile + proc = Popen(["sam_metadata_dumper", url], stdout=PIPE, + stderr=PIPE) + if len(local) > 0 and local != self.inputfile: + os.remove(local) + return proc + + def get_job(self, proc): + """Run the proc in a 60-sec timeout queue, return stdout, stderr""" + q = queue.Queue() + thread = threading.Thread(target=self.wait_for_subprocess, args=[proc, q]) + thread.start() + thread.join(timeout=7200) + if thread.is_alive(): + print('Terminating subprocess because of timeout.') + proc.terminate() + thread.join() + rc = q.get() + jobout = q.get() + joberr = q.get() + if rc != 0: + raise RuntimeError('sam_metadata_dumper returned nonzero exit status {}.'.format(rc)) + return jobout, joberr + + @staticmethod + def wait_for_subprocess(jobinfo, q): + """Run jobinfo, put the return code, stdout, and stderr into a queue""" + jobout, joberr = jobinfo.communicate() + rc = jobinfo.poll() + for item in (rc, jobout, joberr): + q.put(item) + return + + @staticmethod + + def mdart_gen(jobtuple): + """Take Jobout and Joberr (in jobtuple) and return mdart object from that""" +### mdtext = ''.join(line.replace(", ,", ",") for line in jobtuple[0].split('\n') if line[-3:-1] != ' ,') + mdtext = ''.join(line.replace(", ,", ",") for line in jobtuple[0].decode().split('\n') if line[-3:-1] != ' ,') + mdtop = json.JSONDecoder().decode(mdtext) + if len(list(mdtop.keys())) == 0: + print('No top-level key in extracted metadata.') + sys.exit(1) + file_name = list(mdtop.keys())[0] + return mdtop[file_name] + + @staticmethod + def md_handle_application(md): + """If there's no application key in md dict, create the key with a blank dictionary. + Then return md['application'], along with mdval""" + if 'application' not in md: + md['application'] = {} + return md['application'] + + + +class MetaDataKey: + + def __init__(self): + self.expname = '' + + def metadataList(self): + return [self.expname + elt for elt in ('lbneMCGenerators','lbneMCName','lbneMCDetectorType','StageName')] + + def translateKey(self, key): + if key == 'lbneMCDetectorType': + return 'lbne_MC.detector_type' + elif key == 'StageName': + return 'lbne_MC.miscellaneous' + else: + prefix = key[:4] + stem = key[4:] + projNoun = stem.split("MC") + return prefix + "_MC." + projNoun[1] + + + +class expMetaData(MetaData): + """Class to hold/interpret experiment-specific metadata""" + def __init__(self, expname, inputfile): + MetaData.__init__(self, inputfile) + self.expname = expname + #self.exp_md_keyfile = expname + '_metadata_key' +# try: +# #translateMetaData = __import__("experiment_utilities", "MetaDataKey") +# from experiment_utilities import MetaDataKey +# except ImportError: +# print("You have not defined an experiment-specific metadata and key-translating module in experiment_utilities. Exiting") +# raise +# + metaDataModule = MetaDataKey() + self.metadataList, self.translateKeyf = metaDataModule.metadataList(), metaDataModule.translateKey + + def translateKey(self, key): + """Returns the output of the imported translateKey function (as translateKeyf) called on key""" + return self.translateKeyf(key) + + def md_gen(self, mdart, md0={}): + """Loop through art metdata, generate metadata dictionary""" + # define an empty python dictionary which will hold sam metadata. + # Some fields can be copied directly from art metadata to sam metadata. + # Other fields require conversion. + md = {} + topmd = {} + + # Loop over art metadata. + if DEBUG: print ("EXTRACTOR: art",mdart.keys()) + for mdkey in list(mdart.keys()): + mdval = mdart[mdkey] + # Skip some art-specific fields. + if mdkey == 'file_format_version': + pass + elif mdkey == 'file_format_era': + pass + + # Ignore primary run_type field (if any). + # Instead, get run_type from runs field. + + # #HMS elif mdkey == 'run_type': + # # pass + # elif mdkey == 'application.version': + # pass + # elif mdkey == 'application.family': + # pass + # elif mdkey == 'application.name': + # pass + + # do not Ignore data_stream any longer. + + elif mdkey == 'data_stream': + if 'dunemeta.data_stream' not in list(mdart.keys()): # only use this data_stream value if dunemeta.data_stream is not present + md['core.data_stream'] = mdval + + # Ignore process_name as of 2018-09-22 because it is not in SAM yet + elif mdkey == 'art.process_name': +# md['core.application.name'] = mdval + pass + # Application family/name/version. + + elif mdkey == 'applicationFamily': + md['core. application.family'] = mdval + elif mdkey == 'StageName' or mdkey == 'applicationName': + md['core.application.name'] = mdval + elif mdkey == 'applicationVersion': + md['core.application.version'] = mdval + + # patch time format + + elif mdkey in ("start_time", "end_time"): + + newk = "core."+mdkey + t = mdval + if t is not None: + t = datetime.datetime.fromisoformat(t).replace( + tzinfo=datetime.timezone.utc).timestamp() + md[newk] = t + print ("EXTRACTOR: fix time for",mdval,newk,t,md[newk]) + + # Parents. + + elif mdkey == 'parents': + mdparents = [] + if not args.strip_parents: + for parent in mdval: + parent_dict = {'name': parent,'namespace':'unknown'} + mdparents.append(parent_dict) + topmd['parents'] = mdparents + + # Other fields where the key or value requires minor conversion. + elif mdkey == 'runs': + runsSubruns = [] + runs = [] + print (mdart['runs']) + for run, subrun, runtype in mdart.pop("runs", []): + if run not in runs: runs.append(run) + if subrun not in runsSubruns: runsSubruns.append(100000 * run + subrun) + md['core.runs'] = runs + md['core.runs_subruns'] = runsSubruns + + elif mdkey == 'art.first_event': + md[mdkey] = mdval[2] + elif mdkey == 'art.last_event': + md[mdkey] = mdval[2] + elif mdkey == 'first_event': + md['core.'+mdkey+ "_number"] = mdval + elif mdkey == 'last_event': + md['core.'+mdkey+ "_number"] = mdval + elif mdkey == 'detector.hv_status': + md[mdkey] = mdval + elif mdkey == 'detector.hv_value': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_status': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_status': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apas': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_1': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_2': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_3': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_4': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_5': + md[mdkey] = mdval + elif mdkey == 'detector.tpc_apa_6': + md[mdkey] = mdval + elif mdkey == 'detector.pd_status': + md[mdkey] = mdval + elif mdkey == 'detector.crt_status': + md[mdkey] = mdval + elif mdkey == 'daq.readout': + md[mdkey] = mdval + elif mdkey == 'daq.felix_status': + md[mdkey] = mdval + elif mdkey == 'beam.polarity': + md[mdkey] = mdval + elif mdkey == 'beam.momentum': + md[mdkey] = mdval + elif mdkey == 'dunemeta.data_stream': + md['core.data_stream'] = mdval + elif mdkey == 'file_type': + md['core.'+mdkey] = mdval + elif mdkey == 'data_quality.level': + md[mdkey] = mdval + elif mdkey == 'data_quality.is_junk': + md[mdkey] = mdval + elif mdkey == 'data_quality.do_not_process': + md[mdkey] = mdval + elif mdkey == 'data_quality.online_good_run_list': + md[mdkey] = mdval + elif mdkey == 'dunemeta.dune_data.accouple': + md['dune_data.accouple'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.calibpulsemode': + md['dune_data.calibpulsemode'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.daqconfigname': + md['dune_data.DAQConfigName'] = mdval + elif mdkey == 'dunemeta.dune_data.detector_config': + md['dune_data.detector_config'] = mdval + elif mdkey == 'dunemeta.dune_data.febaselinehigh': + md['dune_data.febaselinehigh'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.fegain': + md['dune_data.fegain'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.feleak10x': + md['dune_data.feleak10x'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.feleakhigh': + md['dune_data.feleakhigh'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.feshapingtime': + md['dune_data.feshapingtime'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.inconsistent_hw_config': + md['dune_data.inconsistent_hw_config'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.is_fake_data': + md['dune_data.is_fake_data'] = int(mdval) + elif mdkey == 'dunemeta.dune_data.readout_window': + md['dune_data.readout_window'] = float(mdval) + + # For all other keys, copy art metadata directly to sam metadata. + # This works for run-tuple (run, subrun, runtype) and time stamps. + + else: + if 'art' not in mdkey: + md['core.'+mdkey] = mdart[mdkey] + + # Make the other meta data field parameters + topmd['created_by'] = os.environ['USERF'] + topmd['name'] = self.inputfile.split("/")[-1] + if 'file_size' in md0: + topmd['size'] = md0['file_size'] + else: + topmd['size'] = os.path.getsize(self.inputfile) + # if 'crc' in md0 and not args.no_crc: + # topmd['crc'] = md0['crc'] + # elif not args.no_crc: + # topmdmd['crc'] = root_metadata.fileEnstoreChecksum(self.inputfile) + + # In case we ever want to check out what md is for any instance of MetaData by calling instance.md + topmd['metadata'] = md + self.topmd = topmd + return self.topmd + + def getmetadata(self, md0={}): + """ Get metadata from input file and return as python dictionary. + Calls other methods in class and returns metadata dictionary""" + proc = self.extract_metadata_to_pipe() + jobt = self.get_job(proc) + mdart = self.mdart_gen(jobt) + return self.md_gen(mdart, md0) + +def main(): + + argparser = argparse.ArgumentParser('Parse arguments') + argparser.add_argument('--infile',help='path to input file',required=True,type=str) + argparser.add_argument('--declare',help='validate and declare the metadata for the file specified in --infile to SAM',action='store_true') + argparser.add_argument('--appname',help='application name for metadata',type=str) + argparser.add_argument('--appversion',help='application version for metadata',type=str) + argparser.add_argument('--appfamily',help='application family for metadata',type=str) + argparser.add_argument('--file_type',help='file_type (mc or detector)',type=str) + argparser.add_argument('--file_format',help='file_format (root, artroot ..)',type=str) + argparser.add_argument('--run_type',help='run_type - (fardet-hd, iceberg ...)',type=str) + argparser.add_argument('--campaign',help='Value for dune.campaign for metadata',type=str) + argparser.add_argument('--data_stream',help='Value for data_stream for metadata',type=str) + argparser.add_argument('--data_tier',help='Value for data_tier for metadata',type=str) + argparser.add_argument('--fcl_file',type=str,help="fcl file name", default="unknown") + argparser.add_argument('--requestid',help='Value for dune.requestid for metadata',type=str) + #argparser.add_argument('--set_processed',help='Set for parent file as processed in metadata',action="store_true") + argparser.add_argument('--strip_parents',help='Do not include the file\'s parents in metadata for declaration',action="store_true") + argparser.add_argument('--no_crc',help='Leave the crc out of the generated json',action="store_true") + argparser.add_argument('--skip_dumper',help='Skip running sam_metadata_dumper on the input file',action="store_true") + argparser.add_argument('--input_json',help='Input json file containing metadata to be added to output (can contain ANY valid metacat metadata parameters)',type=str) + argparser.add_argument('--inputDidsFile', type=str, default=None, + help='Optional path to a file containing all input DIDs, one per line') + argparser.add_argument('--no_extract',help='use this if not artroot',action="store_true") + argparser.add_argument('--namespace',type=str,help="namespace for output file",required=True) + global args + args = argparser.parse_args() + + try: +# expSpecificMetadata = expMetaData(os.environ['SAM_EXPERIMENT'], str(sys.argv[1])) + expSpecificMetadata = expMetaData('dune', args.infile) + if not args.no_extract: + mddict = expSpecificMetadata.getmetadata() + else: + mddict = {} + mddict['name']=os.path.basename(args.infile) + mddict['size'] = os.path.getsize(args.infile) + mddict['created_by'] = os.environ['USERF'] + mddict['metadata']={} + print ("EXTRACTOR: building metadata from parent and args as no artroot dump available") + # If --input_json is supplied, open that dict now and add it to the output json + if args.input_json != None: + if os.path.exists(args.input_json): + try: + arbjson = json.load(open(args.input_json,'r')) + if DEBUG: print ("EXTRACTOR: arbjson",arbjson) + + for key,newval in arbjson["metadata"].items(): + + if key in mddict["metadata"]: + if DEBUG: print ("EXTRACTOR: overriding ",key,mddict["metadata"][key],"with", newval, "from json file" ) + else: + if DEBUG: print ("EXTRACTOR: adding ",key, newval, "from json file" ) + + mddict["metadata"][key] = newval + except: + print('Error loading input json file.',args.input_json) + + else: + print('warning, could not open the input json file', args.input_json) + + if args.appname != None: + mddict['metadata']['core.application.name'] = args.appname + if args.appversion != None: + mddict['metadata']['core.application.version'] = args.appversion + if args.appfamily != None: + mddict['metadata']['core.application.family'] = args.appfamily + if args.campaign != None: + mddict['metadata']['dune.campaign'] = args.campaign + if args.data_stream != None: + mddict['metadata']['core.data_stream'] = args.data_stream + if args.data_tier != None: + mddict['metadata']['core.data_tier'] = args.data_tier + if args.file_type != None: + mddict['metadata']['core.file_type'] = args.file_type + if args.file_format != None: + mddict['metadata']['core.file_format'] = args.file_format + if args.run_type != None: + mddict['metadata']['core.run_type'] = args.run_type + if args.requestid != None: + mddict['metadata']['dune.requestid'] = args.requestid + if args.inputDidsFile is not None: + parentDids = [] + for line in open(args.inputDidsFile, "r").read().splitlines(): + ns = line.split(':')[0] + name = line.split(':')[1] + if DEBUG: print ("EXTRACTOR: found a parent",line) + parentDids.append({ "name" : name, "namespace": ns }) + print ("EXTRACTOR: overriding parents with dids from " + args.inputDidsFile, file=sys.stdout) + mddict["parents"] = parentDids + + if args.fcl_file is not None: + mddict["metadata"]["dune.config_file"]=os.path.basename(args.fcl_file) + + mddict["metadata"]["retention.class"]="user" + mddict["metadata"]["retention.status"]="active" + mddict["metadata"]["core.file_content_status"]="good" + # get info from the parent file if possible + # force some items to be like parents + # and replace those that are missing from parents + + force = ['core.data_stream', 'core.run_type', 'core.file_type'] + inheritable = ['core.runs','core.runs_subruns'] + if 'parents' in mddict and len(mddict['parents']) > 0: + theparent = mddict['parents'][0] + thename = theparent['name'] + thenamespace = theparent['namespace'] + thedid = "%s:%s"%(thenamespace,thename) + try: + parentmd = mc_client.get_file(name=thename,namespace=thenamespace,with_metadata=True) + except: + print('Error retrieving parent metadata for did %s:%s' % (thenamespace,thename)) + parentmd = None + if parentmd is not None: + for key in force: # these need to be the same as parents + if key in parentmd['metadata'] : + mddict['metadata'][key] = parentmd['metadata'][key] + print ("EXTRACTOR: forcing " + key + " from parent file " + thedid ) + for key in inheritable: + if key in parentmd['metadata'] and key not in mddict['metadata']: + mddict['metadata'][key] = parentmd['metadata'][key] + print ("EXTRACTOR: inheriting " + key + " from parent file " + thedid) + print ("EXTRACTOR: setting namespace for output",args.namespace) + mddict['namespace']=args.namespace + + + + + except TypeError: + print('You have not implemented a defineMetaData function by providing an experiment.') + print('No metadata keys will be saved') + raise +# mdtext = json.dumps(expSpecificMetadata.getmetadata(), indent=2, sort_keys=True) + + if DEBUG: + mdtext = json.dumps(mddict, indent=2, sort_keys=True) + print(mdtext) + # if args.declare: + # ih.declareFile(mdtext) + + # if args.set_processed: + # swc = mc_client() + # moddict = {"DUNE.production_status" : "processed" } + # for parent in moddict['parents']: + # fname = moddict['parents'][parent]['file_name'] + # try: + # swc.modifyFileMetadata(fname, moddict) + # except: + # print('Error modidying metadata for %s' % fname) + # raise + of = open(mddict["name"]+".json",'w') + json.dump(mddict,of,indent=4, sort_keys=True) + of.close() + #print(mdtext) + sys.exit(0) + + + +if __name__ == "__main__": + main() + + diff --git a/_includes/gitadd.sh b/_includes/gitadd.sh new file mode 100644 index 0000000..cc49595 --- /dev/null +++ b/_includes/gitadd.sh @@ -0,0 +1,15 @@ +git add *.sh *.py setup-grid DUNE*json +tar --no-xattrs -cf ../files/usefulcode.tar \ +extractor_new.py \ +gitadd.sh \ +job_config.sh \ +makercds.sh \ +maketar.sh \ +setup_before_submit.sh \ +submit_local_code.jobscript.sh \ +setup-grid \ +submit_workflow.sh \ +submit_workflow_rucio.sh +#> ../files/usefulcode.tar +git add ../files/usefulcode.tar +git add ../_extras/short_submission.md diff --git a/_includes/job_config.sh b/_includes/job_config.sh new file mode 100755 index 0000000..a1fd43b --- /dev/null +++ b/_includes/job_config.sh @@ -0,0 +1,11 @@ +export FCL_FILE="run_analyseEvents.fcl" # fcl file +export OUTPUT_DATA_TIER1="full-reconstructed" # tier for artroot output +export OUTPUT_DATA_TIER2="root-tuple" # tier for root output +export MQL="files where dune.workflow['workflow_id']=3923 and core.data_tier=full-reconstructed limit 2 ordered " # metacat query for files +export APP_TAG="ana" # application name +export DESCRIPTION="$APP_TAG using $FCL_FILE" # appears as jobname in justin +export USERF=${USER} # make certain the grid knows who your are +export NUM_EVENTS=-1 # process them all +export FNALURL='https://fndcadoor.fnal.gov:2880/dune/scratch/users' # sends output to scratch +export NAMESPACE="usertests" # don't change this unless doing production + diff --git a/_includes/makercds.sh b/_includes/makercds.sh new file mode 100755 index 0000000..257423f --- /dev/null +++ b/_includes/makercds.sh @@ -0,0 +1,22 @@ +# give me the directory name as argument +echo "----------------------------------------------------------------" +echo "makercds.sh $1" +if [ "$1" == "" ] ; then + echo "need to enter the directory name that was used for your tar file" +else + echo "first ensure you have a justin token" + justin time + justin get-token + export HERE=`pwd` + # put the tar file on a bigger disk + export THERE=/exp/dune/data/users/$USER/ + date + ls -lrt $THERE/$1.tar.gz + echo " upload tar file to cvmfs and store location in cvmfs.location file" + export INPUT_TAR_DIR_LOCAL=`justin-cvmfs-upload $THERE/$1.tar.gz` + echo "file uploaded to $INPUT_TAR_DIR_LOCAL" + echo $INPUT_TAR_DIR_LOCAL > $HERE/cvmfs.location + echo "return to previous directory" + cd $HERE + echo "----------------------------------------------------------------" +fi \ No newline at end of file diff --git a/_includes/maketar.sh b/_includes/maketar.sh new file mode 100755 index 0000000..b4365d2 --- /dev/null +++ b/_includes/maketar.sh @@ -0,0 +1,24 @@ +# give me the directory name as argument +echo "----------------------------------------------------------------" +echo "maketar.sh" +export HERE=`pwd` +# put the tar file on a bigger disk +export THERE=/exp/dune/data/users/$USER/ +cd .. # go up one from current directory +date +if [ "$1" == "" ] ; then + echo "need to enter the directory name and be in that directory" +else + echo " make tar file from $1, excluding build_slf7... " + tarname=$(basename $1) + echo "$tarname" + tar --exclude '.git' --exclude build_slf7.x86_64 -cf ${THERE}/${tarname}.tar $1 + date + echo " gzip step " + gzip -f $THERE/${tarname}.tar + date + echo " tar file is at $THERE/${tarname}.tar.gz" + cd $HERE + echo "----------------------------------------------------------------" +fi + diff --git a/_includes/setup-grid b/_includes/setup-grid new file mode 100644 index 0000000..7b3fef1 --- /dev/null +++ b/_includes/setup-grid @@ -0,0 +1,130 @@ +# No magic #!, this script must be sourced! +# this script is part of mrb and gets renamed to setup when copied to the localProducts area + +# NOTICE: this script is not relocatable + +# +# Begin boilerplate. +# + +# Note: All the following special tricks for $_ must continue +# relaying the value to the next rule. Be careful! +# Special trick to nail the value of $_ down in a variety of shells. +echo $_ >& /dev/null +# Special trick for tcsh which is one-off on the command history stack. +: $_ +# Special trick to capture the value of $_ in zsh and bash +test $?shell$_ != 1$_ >& /dev/null && \ + dollar_underscore="$_" && \ + dollar_underscore=`expr "${dollar_underscore}" : ".\(.*\)"` +# Special trick to capture the value of $_ in tcsh +test $?shell = 1 && set dollar_underscore=`echo $_` + +# need to be able to check for mrb +test $?shell = 1 && set ss="csh" || ss="sh" +test "$ss" = "csh" && alias return exit + +test "$ss" = "csh" && \ + alias tnotnull "eval '"'test $?'"\!* -eq 1' && eval '"'test -n "$'"\!*"'"'"'" +test "$ss" = "sh" && \ + eval 'tnotnull() { eval "test -n \"\${$1-}\"" ;}' + +# check for mrb +tnotnull UPS_DIR || ( echo "ERROR:" ; echo "ERROR: you MUST set up UPS!" ; echo "ERROR:" ) +tnotnull UPS_DIR || unset ss +tnotnull UPS_DIR || return 1 +tnotnull MRB_DIR || ( echo "ERROR:"; echo "ERROR: you MUST first setup mrb!"; echo "ERROR:" ) +tnotnull MRB_DIR || unset ss +tnotnull MRB_DIR || return 1 +test -f "$MRB_DIR/libexec/shell_independence" || \ + ( echo "ERROR:" ; echo "ERROR: this mrb area expects mrb >= v5_00_00 (found $MRB_VERSION)!" ; echo "ERROR:" ) +test -f "$MRB_DIR/libexec/shell_independence" || unset ss +test -f "$MRB_DIR/libexec/shell_independence" || return 1 + +# Get the shell independence aliases and functions. +source "$MRB_DIR/libexec/shell_independence" + +# Capture the value of $0 +set_ dollar_zed=`echo "${0}" | sed -e 's/^-//'` + +# Special tricks to figure out if this script has been sourced. +# Works for bash, tcsh, and in some cases for zsh. +set_ is_sourced=false +ifcsh_ + # Note: It is unfortunate that we must hard-code the name + # of this script here, but there is no other way + # that works, tcsh is brain-dead. + set base=`basename "${dollar_zed}"` + test "${base}" != "setup" && \ + set is_sourced=true +else + # Special trick for zsh. + test "${ZSH_NAME}" && test "${dollar_underscore}" = "${dollar_zed}" && \ + is_sourced=true + # If there were arguments then there is no safe way to find out + # whether or not the script was sourced in zsh. Pretend it was. + test "${ZSH_NAME}" && test "${#argv}" != "0" && \ + is_sourced=true + # Special trick for bash. + test "${BASH}" && test "${BASH_SOURCE}" != "${dollar_zed}" && \ + is_sourced=true +# Warning, this must be here because the tcsh parser is brain-dead. +endif +endifcsh_ + +# +# End of boilerplate. Begin of real work. +# + +tnotnull UPS_DIR || ( echo "ERROR:" ; echo "ERROR: you MUST set up UPS" ; echo "ERROR:" ) +tnotnull UPS_DIR || source "$MRB_DIR/libexec/unset_shell_independence" +tnotnull UPS_DIR || unset me db dollar_underscore dollar_zed is_sourced base msg1 flav +tnotnull UPS_DIR || return 1 + + +tnotnull MRB_DIR || ( echo "ERROR:"; echo "ERROR: you MUST first set up mrb!"; echo "ERROR:" ) +tnotnull MRB_DIR || unset me db dollar_underscore dollar_zed is_sourced base msg1 flav +tnotnull MRB_DIR || return 1 + +setenv MRB_PROJECT "larsoft" +setenv MRB_PROJECT_VERSION ${DUNE_VERSION} +setenv MRB_QUALS ${DUNE_QUALIFIER} +setenv MRB_QUALSTRING `sed s/:/_/g <<< ${MRB_QUALS}` +setenv MRB_TOP "${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}" +setenv MRB_TOP_BUILD "${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}" +setenv MRB_SOURCE "${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/srcs" +setenv MRB_INSTALL "${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/localProducts_larsoft_${MRB_PROJECT_VERSION}_${MRB_QUALSTRING}" +setenv PRODUCTS "${MRB_INSTALL}:${PRODUCTS}" +setenv CETPKG_INSTALL "${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/localProducts_larsoft_${MRB_PROJECT_VERSION}_${MRB_QUALSTRING}" + +#--- begin middle boilerplate + +set_ flav=`get-directory-name subdir` +set_ buildDirName="build_${flav}" + +test "$ss" = sh && test -n "${MRB_BUILDDIR}" && setenv OLD_MRB_BUILDDIR "${MRB_BUILDDIR}" +test "$ss" = csh && tnotnull MRB_BUILDDIR && setenv OLD_MRB_BUILDDIR "${MRB_BUILDDIR}" +setenv MRB_BUILDDIR ${MRB_TOP_BUILD}/${buildDirName} + +unset me dollar_underscore dollar_zed is_sourced base msg1 flav + +#--- end middle boilerplate +# report the environment +echo +echo MRB_PROJECT=$MRB_PROJECT +echo MRB_PROJECT_VERSION=$MRB_PROJECT_VERSION +echo MRB_QUALS=$MRB_QUALS +echo MRB_QUALSTRING=$DUNE_QUALIFIER_STRING +echo MRB_TOP=$MRB_TOP +echo MRB_SOURCE=$MRB_SOURCE +echo MRB_BUILDDIR=$MRB_BUILDDIR +echo MRB_INSTALL=$MRB_INSTALL +echo MRB_DIR=$MRB_DIR +echo +echo PRODUCTS=$PRODUCTS +echo CETPKG_INSTALL=$CETPKG_INSTALL +echo + +source "$MRB_DIR/libexec/unset_shell_independence" +unset db buildDirName + diff --git a/_includes/setup_before_submit.sh b/_includes/setup_before_submit.sh new file mode 100755 index 0000000..f0efb3c --- /dev/null +++ b/_includes/setup_before_submit.sh @@ -0,0 +1,33 @@ +echo "----------------------------------------------------------------" +echo "setup_before_submit.sh" + +export DIRECTORY="$(basename "${PWD}")" +export DUNE_VERSION=v09_91_02d01 +export DUNE_QUALIFIER=e26:prof +export DUNE_QUALIFIER_STRING=`echo ${DUNE_QUALIFIER} | tr : _` + +## set up locally just as a check +source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh +setup duneana "$DUNE_VERSION" -q "$DUNE_QUALIFIER" +setup dunesw "$DUNE_VERSION" -q "$DUNE_QUALIFIER" +setup metacat +export METACAT_SERVER_URL=https://metacat.fnal.gov:9443/dune_meta_prod/app +export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/dune +setup justin +echo " code set up" +justin time +echo " you may need to authorize this computer to run the justin command - check to see if there is a URL above and go there to authenticate" +justin get-token + + +export localProductsdir="${PWD}/localProducts_larsoft_${DUNE_VERSION}_${DUNE_QUALIFIER_STRING}" +echo " localProductsdir ${localProductsdir}" + +cp setup-grid $localProductsdir/setup-grid + +echo "Now you should:" +echo "./maketar.sh $DIRECTORY" +echo "./makercds.sh $DIRECTORY" +echo "# edit job_config.sh" +echo "./submit_workflow.sh" +echo "----------------------------------------------------------------" \ No newline at end of file diff --git a/_includes/submit_local_code.jobscript.sh b/_includes/submit_local_code.jobscript.sh new file mode 100755 index 0000000..ed1bba8 --- /dev/null +++ b/_includes/submit_local_code.jobscript.sh @@ -0,0 +1,187 @@ +#!/bin/bash +:<<'EOF' + +To use this jobscript to process 5 files from the dataset fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nu_dune10kt_1x2x6__out1__validation +data and put the output logs in the `usertests` namespace and saves the output in /scratch + +Use these commands to set up ahead of time: + +export DUNE_VERSION= +export DUNE_QUALIFIER= +export FCL_FILE= +export INPUT_TAR_DIR_LOCAL= +export MQL= +export DIRECTORY= + +(see jobs_config.sh for the full list) + +Use this command to create the workflow: + +justin simple-workflow \ +--mql "$MQL" \ +--jobscript submit_local_code.jobscript.sh --rss-mb 4000 \ + --output-pattern "*.root:${FNALURL}/${USERF}" --output-pattern "*.root.json:${FNALURL}/${USERF}" --env APP_TAG=${APP_TAG} --env DIRECTORY=${DIRECTORY} --scope $NAMESPACE --lifetime 30 --env INPUT_TAR_DIR_LOCAL=${INPUT_TAR_DIR_LOCAL} --env DUNE_VERSION=${DUNE_VERSION} --env DUNE_QUALIFIER=${DUNE_QUALIFIER} --env FCL_FILE=${FCL_FILE} --env NUM_EVENTS=${NUM_EVENTS} --env USERF=${USERF} --env NAMESPACE=${NAMESPACE} --description "${DESCRIPTION}" + +see job_config.sh for explanations + +EOF + +# fcl file and DUNE software version/qualifier to be used +FCL_FILE=${FCL_FILE:-${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/my_code/fcls/my_reco.fcl} +APP_TAG=${APP_TAG:-unknown} +#DUNE_VERSION=${DUNE_VERSION:-v09_85_00d00} +#DUNE_QUALIFIER=${DUNE_QUALIFIER:-e26:prof} + +echo "------ set things up -------" +echo "Check environment" +echo "DIRECTORY=$DIRECTORY" +echo "DUNE_VERSION=$DUNE_VERSION" +echo "DUNE_QUALIFIER=$DUNE_QUALIFIER" +echo "FCL_FILE=$FCL_FILE" +echo "MQL=$MQL" +echo "APP_TAG=$APP_TAG" +echo "USERF=$USERF" +echo "NUM_EVENTS=$NUM_EVENTS" +echo "INPUT_TAR_DIR_LOCAL=$INPUT_TAR_DIR_LOCAL" +echo "NAMESPACE=$NAMESPACE" + + + +echo "Current working directory is `pwd`" + + +# number of events to process from the input file +if [ "$NUM_EVENTS" != "" ] ; then + events_option="-n $NUM_EVENTS" +fi + +# First get an unprocessed file from this stage +did_pfn_rse=`$JUSTIN_PATH/justin-get-file` + + +if [ "$did_pfn_rse" = "" ] ; then + echo "Nothing to process - exit jobscript" + exit 0 +fi + +# Keep a record of all input DIDs, for pdjson2meta file -> DID mapping +echo "$did_pfn_rse" | cut -f1 -d' ' >>all-input-dids.txt + +# pfn is also needed when creating justin-processed-pfns.txt +pfn=`echo $did_pfn_rse | cut -f2 -d' '` +did=`echo $did_pfn_rse | cut -f1 -d' '` + +echo "Input PFN = $pfn" + +echo "TARDIR ${INPUT_TAR_DIR_LOCAL}" +echo "CODE DIR ${DIRECTORY}" + +# Setup DUNE environment +localProductsdir=`ls -c1d ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/localProducts*` + +echo "localProductsdir ${localProductsdir}" + + +# seems to require the right name for the setup script + +echo " check that there is a setup in ${localProductsdir}" +ls -lrt ${localProductsdir}/setup-grid +ls -lrt ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/$FCL_FILE +source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh +export PRODUCTS="${localProductsdir}/:$PRODUCTS" + +# Then we can set up our local products +setup duneana "$DUNE_VERSION" -q "$DUNE_QUALIFIER" +setup dunesw "$DUNE_VERSION" -q "$DUNE_QUALIFIER" + +setup metacat +export METACAT_SERVER_URL=https://metacat.fnal.gov:9443/dune_meta_prod/app +export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/dune + +source ${localProductsdir}/setup-grid +mrbslp + +#echo "----- code is set up -----" + +# Construct outFile from input $pfn +now=$(date -u +"%Y%m%d%H%M%SZ") +Ffname=`echo $pfn | awk -F/ '{print $NF}'` +fname=`echo $Ffname | awk -F. '{print $1}'` +# outFile1 is artroot format +# outFile2 is root format for analysis +export outFile1=${fname}_${APP_TAG}_${now}.root +export outFile2=${fname}_${APP_TAG}_tuple_${now}.root + +# echo "make $outFile1" +campaign="justIN.w${JUSTIN_WORKFLOW_ID}s${JUSTIN_STAGE_ID}" + +# Here is where the LArSoft command is call it +( +# Do the scary preload stuff in a subshell! +export LD_PRELOAD=${XROOTD_LIB}/libXrdPosixPreload.so +# echo "$LD_PRELOAD" + + + +#sam_metadata_dumper $pfn + +echo "----- now run lar ------" + +echo "lar -c ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/$FCL_FILE $events_option -o ${outFile1} -T ${outFile2} "$pfn" > ${fname}_${APP_TAG}_${now}.log 2>&1" + +lar -c ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/$FCL_FILE $events_option -o ${outFile1} -T ${outFile2} "$pfn" > ${fname}_${APP_TAG}_${now}.log 2>&1 +) + +larExit=$? + + +# Subshell exits with exit code of last command + + +echo "lar exit code $larExit" + +echo '=== Start last 1000 lines of lar log file ===' +tail -1000 ${fname}_${APP_TAG}_${now}.log +echo '=== End last 1000 lines of lar log file ===' + + +echo "$did" > justin-input-dids.txt + +echo "--------make metadata---------" + +#sam_metadata_dumper ${outFile1} + +FCL_FILE_NAME=$(basename $FCL_FILE) + +echo "python ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/extractor_new.py --infile=${outFile1} --appversion=$DUNE_VERSION --appname=${APP_TAG} --appfamily=larsoft --no_crc --inputDidsFile=justin-input-dids.txt --data_tier='full-reconstructed' --file_format='artroot' --fcl_file=${FCL_FILE_NAME} --namespace=${NAMESPACE} # > $outFile1.json" + +python ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/extractor_new.py --infile=$outFile1 --appversion=$DUNE_VERSION --appname=${APP_TAG} --appfamily=larsoft --no_crc --inputDidsFile=justin-input-dids.txt --data_tier='full-reconstructed' --file_format='artroot' --fcl_file=${FCL_FILE_NAME} --namespace=${NAMESPACE} #> $outFile1.json + +file1Exit=$? + +#cat ${outFile1}.json + +echo "------------ non-artroot metadata -----------" +# here for non-artroot files, salvage what you can from outFile1 + +oldjson=${outFile1}.json + +echo " python ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/extractor_new.py --infile=$outFile2 --appversion=$DUNE_VERSION --appname=${APP_TAG} --appfamily=larsoft --no_crc --inputDidsFile=justin-input-dids.txt --data_tier='root-tuple' --file_format='root' --fcl_file=${FCL_FILE_NAME} --no_extract --input_json=${PWD}/${oldjson} --namespace=${NAMESPACE} # > ${outFile2}.json" + +python ${INPUT_TAR_DIR_LOCAL}/${DIRECTORY}/extractor_new.py --infile=$outFile2 --appversion=$DUNE_VERSION --appname=${APP_TAG} --appfamily=larsoft --no_crc --inputDidsFile=justin-input-dids.txt --data_tier='root-tuple' --file_format='root' --fcl_file=${FCL_FILE_NAME} --no_extract --input_json=${PWD}/${oldjson} --namespace=${NAMESPACE} # > ${outFile2}.json + +file2Exit=$? + +echo "------- finish up ------" +if [ $larExit -eq 0 ] ; then + # Success ! + echo "$pfn" > justin-processed-pfns.txt + jobscriptExit=0 +else + # Oh ! + jobscriptExit=1 +fi + +# Create compressed tar file with all log files +tar zcf `echo "$JUSTIN_JOBSUB_ID.logs.tgz" | sed 's/@/_/g'` *.log +exit $jobscriptExit diff --git a/_includes/submit_workflow.sh b/_includes/submit_workflow.sh new file mode 100755 index 0000000..c6571d2 --- /dev/null +++ b/_includes/submit_workflow.sh @@ -0,0 +1,29 @@ +# actual submission +export INPUT_TAR_DIR_LOCAL=`cat cvmfs.location` + +source job_config.sh # pick up the configuration + +# this sends output to scratch + +echo "---- check the configuration ----" +echo "DIRECTORY=$DIRECTORY" +echo "DUNE_VERSION=$DUNE_VERSION" +echo "DUNE_QUALIFIER=$DUNE_QUALIFIER" +echo "FCL_FILE=$FCL_FILE" +echo "MQL=$MQL" +echo "APP_TAG=$APP_TAG" +echo "USERF=$USERF" +echo "NUM_EVENTS=$NUM_EVENTS" +echo "DESCRIPTION=$DESCRIPTION" +echo "INPUT_TAR_DIR_LOCAL=$INPUT_TAR_DIR_LOCAL" +echo "NAMESPACE=${NAMESPACE}" + +if test -e "./${FCL_FILE}"; then + echo "---- do the submission ----" + justin simple-workflow \ + --mql "$MQL" \ + --jobscript submit_local_code.jobscript.sh --rss-mb 4000 \ + --output-pattern "*.root:${FNALURL}/${USERF}" --output-pattern "*.root.json:${FNALURL}/${USERF}" --env APP_TAG=${APP_TAG} --env DIRECTORY=${DIRECTORY} --scope ${NAMESPACE} --lifetime 700 --env INPUT_TAR_DIR_LOCAL=${INPUT_TAR_DIR_LOCAL} --env DUNE_VERSION=${DUNE_VERSION} --env DUNE_QUALIFIER=${DUNE_QUALIFIER} --env FCL_FILE=${FCL_FILE} --env NUM_EVENTS=${NUM_EVENTS} --env USERF=${USERF} --env NAMESPACE=${NAMESPACE} --description "${DESCRIPTION}" +else + echo "FCL_FILE must be in $DIRECTORY for now" +fi \ No newline at end of file diff --git a/_includes/submit_workflow_rucio.sh b/_includes/submit_workflow_rucio.sh new file mode 100755 index 0000000..851d48b --- /dev/null +++ b/_includes/submit_workflow_rucio.sh @@ -0,0 +1,29 @@ +# actual submission +export INPUT_TAR_DIR_LOCAL=`cat cvmfs.location` + +# this sends output directly to $NAMESPACE in rucio + +source job_config.sh # pick up the configuration + +echo "---- check the configuration ----" +echo "DIRECTORY=$DIRECTORY" +echo "DUNE_VERSION=$DUNE_VERSION" +echo "DUNE_QUALIFIER=$DUNE_QUALIFIER" +echo "FCL_FILE=$FCL_FILE" +echo "MQL=$MQL" +echo "APP_TAG=$APP_TAG" +echo "USERF=$USERF" +echo "NUM_EVENTS=$NUM_EVENTS" +echo "DESCRIPTION=$DESCRIPTION" +echo "INPUT_TAR_DIR_LOCAL=$INPUT_TAR_DIR_LOCAL" +echo "NAMESPACE=${NAMESPACE}" + +if test -e "./${FCL_FILE}"; then + echo "---- do the submission ----" + justin simple-workflow \ + --mql "$MQL" \ + --jobscript submit_local_code.jobscript.sh --rss-mb 4000 \ + --output-pattern "*.root:${USER}-output" --env APP_TAG=${APP_TAG} --env DIRECTORY=${DIRECTORY} --scope ${NAMESPACE} --lifetime 700 --env INPUT_TAR_DIR_LOCAL=${INPUT_TAR_DIR_LOCAL} --env DUNE_VERSION=${DUNE_VERSION} --env DUNE_QUALIFIER=${DUNE_QUALIFIER} --env FCL_FILE=${FCL_FILE} --env NUM_EVENTS=${NUM_EVENTS} --env USERF=${USERF} --env NAMESPACE=${NAMESPACE} --description "${DESCRIPTION}" +else + echo "FCL_FILE must be in $DIRECTORY for now" +fi \ No newline at end of file diff --git a/_includes/test_workflow.sh b/_includes/test_workflow.sh new file mode 100755 index 0000000..5fde80e --- /dev/null +++ b/_includes/test_workflow.sh @@ -0,0 +1,26 @@ +# actual submission +# tarball is in a local area on my machine (could also set to cvmfs location +export INPUT_TAR_DIR_LOCAL=$DUNEDATA +source ./job_config.sh + +# these are things you need to set ahead of time to run/create metadata - see job_config.sh +export NUM_EVENTS=2 +echo "DIRECTORY=$DIRECTORY" +echo "DUNE_VERSION=$DUNE_VERSION" +echo "DUNE_QUALIFIER=$DUNE_QUALIFIER" +echo "FCL_FILE=$FCL_FILE" +echo "MQL=${MQL}" +echo "APP_TAG=$APP_TAG" +echo "USERF=$USERF" +echo "NUM_EVENTS=$NUM_EVENTS" +echo "DESCRIPTION=$DESCRIPTION" +echo "INPUT_TAR_DIR_LOCAL=$INPUT_TAR_DIR_LOCAL" + + +echo "tardir $INPUT_TAR_DIR_LOCAL" +export HERE=$PWD + +justin-test-jobscript \ +--mql "$MQL" \ +--jobscript submit_local_code.jobscript.sh --env PROCESS_TYPE=${PROCESS_TYPE} --env DIRECTORY=${DIRECTORY} --env INPUT_TAR_DIR_LOCAL=${INPUT_TAR_DIR_LOCAL} --env DUNE_VERSION=${DUNE_VERSION} --env DUNE_QUALIFIER=${DUNE_QUALIFIER} --env FCL_FILE=${FCL_FILE} --env NUM_EVENTS=${NUM_EVENTS} --env USERF=${USER} --env APP_TAG=${APP_TAG} --env NAMESPACE=${NAMESPACE} + diff --git a/files/usefulcode.tar b/files/usefulcode.tar new file mode 100644 index 0000000..9f4a9d3 Binary files /dev/null and b/files/usefulcode.tar differ diff --git a/gitadd.sh b/gitadd.sh index 5f235b1..f65fe7f 100644 --- a/gitadd.sh +++ b/gitadd.sh @@ -1,19 +1,8 @@ git add *.md git add _episodes/*.md -#git add _episodes/01-introduction.md -#git add _episodes/02-storage-spaces.md -#git add _episodes/03-data-management.md -#git add _episodes/03.2-UPS.md -#git add _episodes/03.3-cvmfs.md -#git add _episodes/04-intro-art-larsoft.md -#git add _episodes/05.5-mrb.md -#git add _episodes/06-larsoft-modify-module.md -#git add _episodes/07-grid-job-submission_al9.md -#git add _episodes/07-grid-job-submission-al9.md -#git add _episodes/08-submit-jobs-w-justin.md -#git add _episodes/09-grid-batch-debug.md -#git add _episodes/10-closing-remarks.md git add _includes/*.html +git add _includes/*.sh +git add _includes/setup* git add *.yml git add _extras/*.md git add AUTHORS CITATION diff --git a/index.md b/index.md index e00e8bf..8550362 100644 --- a/index.md +++ b/index.md @@ -23,7 +23,7 @@ This tutorial will teach you the basics of DUNE batch computing. Instructors will engage students with hands-on lessons focused in three areas: -1. The [justIn](https://dunejustin.fnal.gov) batch system +1. The [justIN](https://dunejustin.fnal.gov) batch system 2. The jobsub batch system