CRAB @ FNAL Tutorial for SL3

Introduction

This tutorial exercises the two primary CRAB workflow performing an analysis task on a dataset using the sandbox and a storage element to retrieve output.

Recipe for the tutorial

For this tutorial we will use:

  • CMSSW_1_3_1

an already prepared CMSSW analysis code to analyze a Higgs->ZZ->4mu sample, which replicates a real analysis scenario.

  • Location:

cmsuaf.fnal.gov

  • CRAB_1_5_2

using the central installation available at FNAL.

The example is written to use the csh shell family

If you want to use sh replace csh with sh.

Setup local Environment and prepare user analysis code

In order to submit jobs to the Grid, you must have access to a LCG or OSG User Interface (LCG UI). It will allow you to access WLCG- and OSG-affiliated resoures in a fully transparent way. LXPLUS users can get an LCG UI via AFS by:

source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh

At FNAL, the UI is pre-initialized at login.

Prepare user analysis code

Install CMSSW project in a directory of your choice:

mkdir Tutorial
cd Tutorial
scramv1 p CMSSW CMSSW_1_3_1
cd CMSSW_1_3_1/src
eval `scramv1 runtime -csh`

get and compile the example of the user analysis code

wget  https://twiki.cern.ch/twiki/pub/Main/CRABatFNALTutorialSL3/Demo.tgz
tar zxvf Demo.tgz
scramv1 b

CRAB setup

Most users (particularly those on LXPLUS or at FNAL) do not need to install CRAB. They only need to set it up.

CRAB is intended to be installed in a private area for use by a single person, or in a common area for use by all system users. A public installation is available on CERN's LXPLUS and FNAL.

At CERN on LXPLUS, users may access CRAB at (shown for arbitrary version X_Y_Z):

/afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_X_Y_Z

At FNAL, users may access CRAB at (shown for arbitrary version X_Y_Z):

/uscmst1/prod/grid/CRAB_X_Y_Z

To know the latest release check CRAB web page or proper HyperNews forum.

Setup on lxplus:

In order to setup and use CRAB from any directory, source the the script crab.(c)sh located in /afs/cern.ch/cms/ccs/wm/scripts/Crab/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (typically use it from your CMSSW working directory).

source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh

The first time you call the CRAB initialization script, you'll get a message like: User-boss DB not installed: run configureBoss. You have to Initialize BOSS, one of CRAB's sub-components, by executing the following command:

$CRABDIR/configureBoss

BOSS will create two directories in your home directory:

boss
.bossrc

which should not be removed.

NOTE: Sourcing the crab.sh|csh script has to be done after the installation and at the start of every new session, but you need to run configureBoss only the very first time.

Setup on cmsuaf.fnal.gov / cmslpc.fnal.gov:

In order to setup and use CRAB from any directory, source the the script crab.(c)sh located in /uscmst1/prod/grid/CRAB/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (typically use it from your CMSSW working directory).

source /uscmst1/prod/grid/CRAB/crab.csh

The first time you call the CRAB initialization script, you'll get a message like: User-boss DB not installed: run configureBoss. You have to Initialize BOSS, one of CRAB's sub-components, by executing the following command:

$CRABDIR/configureBoss

BOSS will create two directories in your home directory:

boss
.bossrc

which should not be removed.

NOTE: Sourcing the crab.sh|csh script has to be done after the installation and at the start of every new session, but you need to run configureBoss only the very first time.

Data selection

To select data you want to access, use the DBS web page where available datasets are listed DBS Data Discovery (see links on CRAB home page). For this tutorial we'll use :

/RelVal131Higgs-ZZ-4Mu/CMSSW_1_3_1-1176118250/GEN-SIM-DIGI-RECO
Keyword sarch for:
  • *RelVal131Higgs-ZZ-4Mu*

CRAB configuration

Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABDIR/python/crab.cfg . For guidance, see the list and description of configuration parameters. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] and [EDG].

The configuration file should be located at the same location as the CMSSW parameter-set to be used by CRAB. Please change directory to :

cd  Demo/MyTrackAnalyzer/test/ 
and save the crab configuration file:
crab.cfg
with the following content:
[CRAB]
jobtype                = cmssw
scheduler              = edg

[CMSSW]
datasetpath            = /RelVal131Higgs-ZZ-4Mu/CMSSW_1_3_1-1176118250/GEN-SIM-DIGI-RECO
pset                   = higgs.cfg
total_number_of_events = 100
number_of_jobs         = 10
output_file            = histograms.root

[USER]
return_data            = 1
use_central_bossDB     = 0
use_boss_rt            = 0

[EDG]
ce_black_list          = cmsosgce.fnal.gov
rb                     = CERN 
proxy_server           = myproxy.cern.ch 
virtual_organization   = cms
retry_count            = 0
lcg_catalog_type       = lfc
lfc_host               = lfc-cms-test.cern.ch
lfc_home               = /grid/cms

Run Crab

Once your crab.cfg is ready and the whole underlying environment is set up, you can start to run CRAB. CRAB supports a command line help which can be useful for the first time. You can get it via:
crab -h
in particular there is a HOW TO RUN CRAB FOR THE IMPATIENT USER section where the base command are reported.

Job Creation

The job creation checks the availability of the selected dataset and prepares all the jobs for submission according to the selected job splitting specifyed on the crab.cfg

The creation process creates a CRAB project directory (default: crab_0__

CRAB also allows the user to chose a project name, so that it can be used later to distinguish multiple CRAB projects in the same directory.

crab -create

which should produce a similar screen output like:

crab. crab (version 1.5.2) running on Mon Jun 11 17:40:14 2007
 
crab. Working options:
  scheduler           edg
  job type            CMSSW
  working directory   /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/
 
crab. Contacting DBS...
crab. Required data are :/RelVal131Higgs-ZZ-4Mu/CMSSW_1_3_1-1176118250/GEN-SIM-DIGI-RECO
crab. The number of available events is 2200
 
crab. Contacting DLS...
crab. Sites (2) hosting part/all of dataset: ['srm.cern.ch', 'cmssrm.fnal.gov']
crab. May not create the exact number_of_jobs requested.
 
crab. 10 job(s) can run on 100 events.
 
crab. Creating 10 jobs, please wait...
 
crab. Total of 10 jobs created.
 
crab. Log-file is /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/log/crab.log

Job Submission

With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all.

To submit all jobs of the last created project with the default name, it's enough to execute the following command:

crab -submit 

to submit a specific project:

crab -submit -c  <dir name>

which should produce a similar screen output like:

crab. crab (version 1.5.2) running on Mon Jun 11 17:42:09 2007
 
crab. Working options:
  scheduler           edg
  job type            CMSSW
  working directory   /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/
 
crab. Matched Sites :['cmslcgce.fnal.gov', 'cmslcgce2.fnal.gov']
crab. Found 2 compatible site(s) for job 1
                                                           Submitting 10 jobs
100% [=============================================================================================================================]
                                                              please wait
crab. Total of 10 jobs submitted.
 
crab. Total of 10 jobs submitted.
crab. Log-file is /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/log/crab.log

Job Status Check

Check the status of the jobs in the latest CRAB project with the following command:

crab -status 
for check a specific project:
crab -status -c  <dir name>

which should produce a similar screen output like:

crab. crab (version 1.5.2) running on Mon Jun 11 17:51:55 2007
                                                                                                                                                  
crab. Working options:
  scheduler           edg
  job type            CMSSW
  working directory   /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/
                                                                                                                                                  
crab. Checking the status of all jobs: please wait
Chain    STATUS             E_HOST                                   EXE_EXIT_CODE JOB_EXIT_STATUS
---------------------------------------------------------------------------------------------------
1        Running            cmslcgce2.fnal.gov
2        Running            cmslcgce.fnal.gov
3        Running            cmslcgce2.fnal.gov
4        Running            cmslcgce.fnal.gov
5        Running            cmslcgce.fnal.gov
6        Running            cmslcgce.fnal.gov
7        Running            cmslcgce2.fnal.gov
8        Running            cmslcgce2.fnal.gov
9        Running            cmslcgce.fnal.gov
10       Running            cmslcgce.fnal.gov
                                                                                                                                                  
>>>>>>>>> 10 Total Jobs
                                                                                                                                                  
>>>>>>>>> 10 Jobs Running
          List of jobs: 1,2,3,4,5,6,7,8,9,10
crab. Log-file is /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/log/crab.log

Job Output Retrieval

For the jobs which are in status done it's possible to retrieve their output back to the UI. The following command retrieves the output of all jobs with status done of the last created CRAB project:

crab -getoutput 
to get the output of a specific project:
crab -getoutput -c  <dir name>

it can be repeated as long as there are jobs in status done.

The job results will be copied in the res subdir of your crab project, and it's specified by a message like

crab. crab (version 1.5.2) running on Mon Jun 11 18:04:43 2007
                                                                                                                                                  
crab. Working options:
  scheduler           edg
  job type            CMSSW
  working directory   /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/
                                                                                                                                                  
crab. Results of Job # 1 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 2 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 3 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 4 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 5 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 6 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 7 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 8 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 9 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
crab. Results of Job # 10 are in /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/res/
                                                                                                                                                  
crab. Log-file is /afs/cern.ch/user/s/spiga/scratch0/Tutorial/CMSSW_1_3_1/src/Demo/MyTrackAnalyzer/test/crab_0_070611_174014/log/crab.log

Final plot

All 10 jobs produce a histogram output file which can be combined using ROOT in the res directory:

hadd histograms.root histograms_*.root

The final histograms.root opened in ROOT contains the final plot:

mzz->Draw();

Exercises

In parallel to the execution of the first 10 jobs, the user can try the OSG exclusive GRID submission mode using Condor-G by changing

scheduler = edg

to

scheduler = condor_g

As this is a dedicated submission to a site without using the resource broker, a single SE has to be selected using se_white_list ind the [EDG] section of the crab.cfg. Please remove all white and black list statements and add:

se_white_list = cmssrm.fnal.gov

to the [EDG] section of your crab.cfg.

CRAB with writing out CMSSW ROOT files

New CMSSW parameter-set

To write out a CMSSW ROOT file in this example, please create a new CMSSW parameter-set named

higgs2.cfg

with following content:

process A = {

  source = PoolSource {
    untracked vstring fileNames = {
    }
    untracked int32   maxEvents = 100
    untracked uint32 skipEvents = 0
  }

  service = TFileService {
    string fileName = "histograms.root"
  }

  module higgs = MyTrackAnalyzer {
        untracked InputTag tracks = ctfWithMaterialTracks
  }

  module out = PoolOutputModule {
    untracked string fileName = "output.root"
  }

  path p = {
    higgs
  }

  endpath e = {
    out
  }

}

Prepare dCache area at FNAL for storage element interaction

For CRAB to be able to write into your dCache user directory:

/pnfs/cms/WAX/resilient/<username>

we have to create a destination directory and change the file permissions:

mkdir /pnfs/cms/WAX/resilient/<username>/tutorial 
chmod +775 /pnfs/cms/WAX/resilient/<username>/tutorial

replacing <username> with your username.

Prepare new crab.cfg

Now the cmssw parameter-set produces an output file (output.root) which the user can include into the output file card in the new cra.cfg and can ask CRAB to copy it in the FNAL Storage Element (dCache). Please modify the crab.cfg as in the following example:

[CRAB]
jobtype                = cmssw
scheduler              = edg

[CMSSW]
datasetpath            = /RelVal131Higgs-ZZ-4Mu/CMSSW_1_3_1-1176118250/GEN-SIM-DIGI-RECO
pset                   = higgs2.cfg
total_number_of_events = 100
number_of_jobs         = 10
output_file            = histograms.root,output.root

[USER]
return_data            = 0
copy_data              = 1
storage_element        = cmssrm.fnal.gov
storage_path           = /srm/managerv1?SFN=/resilient/<username>/tutorial
use_central_bossDB     = 0
use_boss_rt            = 0

[EDG]
ce_black_list          = cmsosgce.fnal.gov
rb                     = CERN 
proxy_server           = myproxy.cern.ch 
virtual_organization   = cms
retry_count            = 0
lcg_catalog_type       = lfc
lfc_host               = lfc-cms-test.cern.ch
lfc_home               = /grid/cms

replacing <username> with your username.

Exercise for the very experienced user

Continuing the development, CRAB will deploy a server which takes over job submission and resubmission centrally and reducing the submission time significantly. A first test tutorial is available here.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2007-06-27 - OliverGutsche
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback