5.6 Data Analysis with CRAB

Complete: 3


Detailed Review status

Introduction and Editorial Note

This Workbook Chapter reproduces text from the CRAB guide: SWGuideCrab and other SWGuide pages linked in there. Text from the SWGuide twiki's is included, not linked, for easier reading and a bit reorganized but any change there will be reflected here. There should be no need to edit this twiki page to update instructions.

CRAB is a utility to submit CMSSW jobs to distributed computing resources. By using CRAB you will be able to:

  • Access CMS data and Monte-Carlo which are distributed to CMS aligned centres worldwide.
  • Exploit the CPU and storage resources at CMS aligned centres.

Prerequisites

To use CRAB to submit your CMSSW job to the Grid you must meet some prerequisites:

Get a Grid certificate and the registration to CMS VO

CRAB submits jobs to the Grid (LCG), so you need to run it from an User Interface, with a valid certificate, issued by your appropriate Certification Authority, and have a valid proxy. You need also to be registered on VORMS server. To get a certificate from CERN CA and register to CMS VO, you can find detailed instruction in the SWGuideLcgAccess page. If you get a certificate from another Certification Authority, the procedure to register to CMS VO with your certificate should be the same.

Setup your certificate for LCG

See instructions in this Offline Workbook page

Test your grid certificate

  1. Is your personal certificate able to generate Grid proxies? To find out, after having setup your environment run this command:
    grid-proxy-init -debug -verify 
    In case of failure, the possible causes are:
    • the certificate/key pair is not installed in $HOME/.globus/usercert.pem $HOME/.globus/userkey.pem (a.k.a. "pem files")
    • the certificate has expired
    • the certificate and the private key do not match
    In the first case, you either do not have a certificate at all or have to install it on the UI; in the second case, you should get a new certificate; in the third case you probably have incorrectly installed your certificate.
  2. Are you a member of the CMS VO? To see if this is the case, you can execute this command:
    voms-proxy-init -voms cms 
    If you get an error, chances are that you did not register to the CMS VO, or your registration expired. In this case, please follow the instructions in the SWGuideLcgAccess page
  3. You can verify the expiration date of your certificate with:
    openssl x509  -subject -dates -noout  -in $HOME/.globus/usercert.pem 
  4. see also: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideVomsFAQ

Test the code locally

Before launching a million event analysis job on Grid, be sure to test your code locally in a clean area.

  1. Build a new CMSSW area (for example, CMSSW_5_2_5_...; pick as appropriate to your job):
    cmsrel CMSSW_5_2_5
    cd CMSSW_5_2_5/src
    cmsenv
  2. Check-out from the cvs repository only the code or configuration files you need to modify, and build your local libraries including your analysis code.
  3. Make sure that the code you check-out is compatible with the CMSSW version you are using.
  4. Make sure that the CMSSW version you are using is compatible with the data you intend to read.
  5. Prepare a test job accessing the data you will access in your Grid job. There are several ways to read the proper data:
    • The easiest way is to use the xrootd service to read data directly from a remote site. How to do this is explained in Using Xrootd Service for remote Data Accessing.
    • You can also use the xrootd service to copy a data file from a suitable dataset to your local machine (to work w/o network e.g.), as explained in File download with command-line tools.
    • If no suitable files exist, you can generate some events using the configuration file which is available from the DAS service.
  6. Test your CMSSW configuration file locally in order to avoid problems with the ParameterSet parsing.
  7. Run the job interactively (e.g. at CERN on lxplus):
    cmsRun your-pset-config-file.py 

Validate a CMSSW config file

In CRAB2, a user can validate its CMSSW configuration file by launching crab -validateCfg after creating the task with crab -create. In this way the configuration file will be controlled and validated by a corresponding python API. Note that it is not enough to check that the configuration file runs interactively, because in interactive mode CMSSW is too tolerant with python errors in that configuration file. At times a user may worry that the problem is in CRAB or CRAB validation rather than in the configuration file; in this case, one can use the following test, which does not involve CRAB:

edmConfigHash your-pset-config-file.py
Note that this is needed, but not necessarely sufficient, to have a valid CMSSW configuration file. Other problem could be related to some hidden charatecters (^M) in the configuration file, overall if it was downloaded from the web. To discover them you can use the command
cat -v your-pset-config-file.py
and remove them with the command
perl -pi -e 'tr/\cM//d;' your-pset-config-file.py
Then you can revalidate the configuration file again.

Use CRAB at CERN

please see SWGuideCrab, in particular : https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3CheatSheet#Environment_setup

Use CRAB outside CERN

Preferred way: use CVMFS

  1. Setup the Grid UI according to your site directions
  2. Follow same instructions as at CERN (CVMFS is globally available)

Basic Crab Commands

Please see SWGuideCrab in particular see CRAB3Commands

Common operations with CRAB

Please see SWGuideCrab

Return results locally

Please see SWGuideCrab in particular see CRAB3ConfigurationFile

Copy results to a Storage Element

Please see SWGuideCrab in particular see CRAB3ConfigurationFile

Publish copied results in a Storage Element to a DBS instance

Please see SWGuideCrab in particular see CRAB3ConfigurationFile

Analyse published results

Please see SWGuideCrab

Review status

Reviewer/Editor and Date (copy from screen) Comments
StefanoBelforte - 2016-06-01 remove references to CRAB2 documentation, point to CRAB3 guide
StefanoBelforte - 30-Aug-2013 overstrike crab server
JohnStupak - 4-June-2013 Review and minor revisions
NitishDhingra - 02-Mar-2012 See detailed comments below
DaveEvans - 10 March 2010 Update stageout examples
Complete Review, no changes. The information on page is quite clear.

Responsible: StefanoBelforte
Last reviewed by: Review Me

Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r26 - 2022-10-15 - JhovannyMejia


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback