Run pilot Event Server Jobs on HPC
Panda Event Server Job: https://twiki.cern.ch/twiki/bin/view/PanDA/EventServer

Here is the instructions how to run pilot on HPC.

setup the environment
  1. Using CVMFS
                    export OSG_GRID=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/emi/current/
                    export VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
           

If cvmfs is available, wrapper-0.9.10.sh will setup the env. Here is f_setup_osg() in wrapper-0.9.10.sh

        f_setup_osg(){
        # If OSG setup script exists, run it
        if test ! $OSG_GRID = ""; then
                f_print_info_msg "setting up OSG environment"
                if test -f $OSG_GRID/setup.sh ; then
                        echo "Running OSG setup from $OSG_GRID/setup.sh"

                        source $OSG_GRID/setup.sh
                        #source /project/projectdirs/atlas/sw/python-yampl/setup.sh
                        #export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/alps
                        #source /global/homes/w/wguan/software/boto/setup.sh
                        source /cvmfs/atlas.cern.ch/repo/sw/local/setup-yampl.sh
                        #PN
                        export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
                        source /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/user/atlasLocalSetup.sh --quiet
                        source /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/packageSetups/atlasLocalDQ2ClientSetup.sh --quiet
                        /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/packageSetups/atlasLocalRucioClientsSetup.sh --quiet
                        /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/packageSetups/atlasLocalEmiSetup.sh --force
                        source /cvmfs/atlas.cern.ch/repo/sw/external/boto/setup.sh
        

  1. Not using CVMFS, here is an example used on NERSC Edison. You can copy the whole grid_env directory from NERSC to your working directory.

                    export OSG_GRID=/global/project/projectdirs/atlas/pilot/grid_env
                    export VO_ATLAS_SW_DIR=/project/projectdirs/atlas
       

wrapper-0.9.10.sh will run $OSG_GRID/setup.sh to setup the environment for pilot. This is an example on NERSC Edison.

            #!/bin/bash

            MyDir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

           #emi
           source ${MyDir}/emi/current/setup.sh

           # yampl
           #source /project/projectdirs/atlas/sw/python-yampl/setup.sh
           source ${MyDir}/yampl/setup.sh
           export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/alps

           # rucio
           source ${MyDir}/rucio/current/setup.sh

           # dq2
           source ${MyDir}/dq2/current/setup.sh

           # boto, for S3Objectstore
           source ${MyDir}/boto/setup.sh

           #external
           #libs missing on cray
          source ${MyDir}/external/setup.sh

           #xrootd, for xrootd objectstore
           source ${MyDir}/xrootd/current/setup.sh
        

setup the queue on schedconfig
appdir: (for non CVMFS)pilot will try to check Athena installation under ${appdir}/athena_version and ${appdir}/cmttag/athena_version
             Here is an example on Edison, the appdir is /scratch1/scratchdirs/tsulaia/sw/software/
                   edison07 wguan/Edison> ls /scratch1/scratchdirs/tsulaia/sw/software/
                   19.2.0  x86_64-slc6-gcc47-opt
                   edison07 wguan/Edison> ls /scratch1/scratchdirs/tsulaia/sw/software/x86_64-slc6-gcc47-opt/
                   19.2.0
        

objectstore: used to store output event service files and logs

                   objectstore:   "s3://cephgw.usatlas.bnl.gov:8443/|eventservice^/atlas_pilot_bucket/eventservice|logs^/atlas_pilot_bucket/logs"
        

catchall: used to control whether RunJobHpcEvent will be executed.

                    Here is an example for catchall.

                   catchall:  "HPC_HPC,mode=normal,queue=debug,backfill_queue=regular,max_events=2000,initialtime_m=8, time_per_event_m=13,repo=m2015,nodes=2,min_nodes=2,max_nodes=3,partition=edison,min_walltime_m=28,walltime_m=30,max_walltime_m=30,cpu_per_node=24,mppnppn=1,ATHENA_PROC_NUMBER=23"

                   catchall:  "HPC_HPC,mode=backfill,queue=debug,backfill_queue=regular,max_events=2000,initialtime_m=8, time_per_event_m=10,nodes=2,min_nodes=2,max_nodes=3,partition=edison,min_walltime_m=28,walltime_m=30,max_walltime_m=30,cpu_per_node=24,mppnppn=1,ATHENA_PROC_NUMBER=23"

                   for catchall, required items are HPC_HPC.  The default queue is 'regular'., If you want to run in backfill mode, you need to add 'mode=backfill,backfill_queue=regular' and you need to add 'partition' too. 'partition' is used to get free resources. It 'partition' is not set, it will ignore 'mode=backfill' and run on normal mode.
 
        

Run pilot
Here is an example to start pilot on NERSC Edision.
            #cat /project/projectdirs/atlas/pilot/RunPilotEdison.sh
           # if cvmfs
           # export OSG_GRID=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/emi/current/
           # export VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
 
            # NERSC
            export OSG_GRID=/global/project/projectdirs/atlas/pilot/grid_env
            export VO_ATLAS_SW_DIR=/project/projectdirs/atlas

            rm -f wrapper-0.9.10.sh
            wget http://wguan-wisc.web.cern.ch/wguan-wisc/wrapper-0.9.10_hpc.sh -O wrapper-0.9.10.sh

            chmod +x wrapper-0.9.10.sh

            export COPYTOOL=gfal-copy
            export COPYTOOLIN=gfal-copy

            export PATH=$PATH:/opt/torque/4.2.7/bin:/opt/torque/4.2.7/sbin
            export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/torque/4.2.7/lib


            #./wrapper-0.9.10.sh  --wrapperloglevel=debug --wrappergrid=OSG --wrapperwmsqueue=BNL_PROD_MCORE --wrapperbatchqueue=BNL_PROD_MCORE-condor --wrappervo=ATLAS  --wrappertarballurl=http://dev.racf.bnl.gov/dist/wrapper/wrapperplugins-0.9.10.tar.gz --wrapperpilotcodeurl=http://wguan-wisc.web.cern.ch/wguan-wisc/ --wrapperpilotcode=wguan-pilot-dev-HPC -s BNL_PROD_MCORE -q BNL_PROD_MCORE-condor -u ptest -w https://aipanda007.cern.ch -p 25443 -d /global/homes/w/wguan/testpilot/test/tmp
            ./wrapper-0.9.10.sh  --wrapperloglevel=debug --wrappergrid=OSG --wrapperwmsqueue=NERSC_Edison --wrapperbatchqueue=NERSC_Edison --wrappervo=ATLAS  --wrappertarballurl=http://dev.racf.bnl.gov/dist/wrapper/wrapperplugins-0.9.10.tar.gz --wrapperpilotcodeurl=http://wguan-wisc.web.cern.ch/wguan-wisc/ --wrapperpilotcode=wguan-pilot-dev-HPC -s NERSC_Edison -q NERSC_Edison -u ptest -w https://aipanda007.cern.ch -p 25443 -d /scratch2/scratchdirs/wguan/Edison
        

In this example, settting up COPYTOOLIN and COPYTOOLIN environment is for staging input files from SE and staging out output log files to SE. I only tested gfal site mover on HPC cray system. " -d /scratch2/scratchdirs/wguan/Edison" is the pilot working directory. You need to change it to your writable directory.

-- WenGuan - 30 Jul 2014 -- WenGuan - 2014-10-29

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng pilot-yoda.png r1 manage 57.9 K 2014-10-29 - 17:43 WenGuan  
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-02-24 - WenGuan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback