NikhefCPUResources

This is the LHCb Nikhef group page describing locally available CPU resources



Local, batch, grid

Pre-requisites and outcomes

You are assumed to know what SLC6 is, how to use a unix platform, and to have done the LHCb software tutorials.

This Twiki will discuss the computing resources available to you so that you can continue on your analysis.

Introduction

  • You have a lot of CPU at your fingertips... do you know about it all?
  • Do you want to know what system is best for a given activity?
  • LHCb defines certain jargon.. what does it all mean?

Read On...

LHCb computing model

LHCb has a well-defined computing model. The computing model accounts for user activity in a few forms:

  • to provide adequate support LHCb restricts usage patterns within the computing model
  • to facilitate book-keeping, projections, procurement, resource management, LHCb restricts users to patterns within the computing model

CPU resources are divided into "tiers" where certain activities are expected:

  • tier-0: the CERN analysis centre/farm.
    • Used for central reconstruction, storage of data, data processing, and a small amout of user analysis.
  • tier-1: a short-list of key sites with a lot of dedicated CPU resouces to be shared by the grid community including LHCb.
    • Used for central reconstruction, storage of data, data processing, and a larger amount of user analysis.
  • tier-2: Smaller ancilliary sites with significant dedicated CPU and resources,
    • with some guaranteed access to LHCb data,
    • associated with a given tier-1 or having their own storage systems.
    • Used for MC production, and available for user jobs.
  • tier-3: Small clusters which may form part of a larger tier-2,
    • are usually not directly grid accessible and may be shared with other activities.
    • Not normally used for central production directly,
    • used heavily by user jobs requiring no input data such as toy studies,
    • available for non-grid analysis.
  • tier-4: A personal resource, laptop/desktop, maintained by the user, not accounted in the computing model.

Apart from tier-3/4 resources, it is assumed the entire LHCb community has guaranteed access to all the resources.

LHCb supports primarily GRID submission through DIRAC using the Ganga front end. Ganga supports a multitude of different possible backends. All other approaches are either not really allowed, or supported only on a best-effort basis.

A few activities which are specifically not supported are:

  • Priority access to given users or groups to grid resources based on location. This is very hard to book-keep. Instead we define "roles".
  • Using directly LCG/grid tools without Ganga/Dirac to submit to grid resources. This is very hard to maintain and impossible to manage.

When to use what and why!

What When and Why?
Desktop Code development, small-scale analysis and whatever else you want, remember that you'll be losing out on the power of the Grid and StoomBoot if you stick to your local desktop.
local interactive nodes Code development, small-scale analysis requiring a lot of IO and/or a lot of CPU, intensive fit procedures, and whatever else you want
lxplus Don't have SLC6 on your desktop, or don't have quick access to a different institute? Want to share your code and results with collaborators through afs. Need to use CERN castor for data storage.
lxbatch Only if you really really have to. It's slow, annoying, and you have a lot of competition.
Local Batch System (StoomBoot) Mid-scale analysis benefitting from parallelization, but not necessarily needing LHCb software and not needing data files only on the grid. Really only use for software which will not run on the Grid. Usually, you can still use the grid directly if you can work within the lhcb software environment.
The Grid Large-scale analysis. Any jobs requiring Grid-data. Any jobs whose output needs to be stored centrally. The grid is good for practically everything which can be done inside the lhcb-software environment.

Test, test, test

With Ganga you are encouraged to:

  • test your scripts on a small test sample on a local machine,
  • then test for other scaling problems on a local batch system,
  • ONLY SUBMIT TO THE GRID JOBS WHICH YOU KNOW WILL WORK

Before using the Grid consider the hints and tips from the:

Computing resources for you

As a member of CERN you have access to:

  • lxplus: lxplus.cern.ch, a set of interactive nodes for CERN users
  • lxbatch: also known as LSF, the CERN batch system
  • Exploiting the CERN resources is a topic for LHCb tutorials
  • Login to lxplus and use the LSF backend to submit with Ganga

You probably already know about lxplus and lxbatch which respectively are tier-3 in LHCb jargon.

As a member of LHCb you have access to:

As a member of Nikhef you are provided with also has lots of resources on site for your exploitation

  • Your desktop
  • Gateway machines
  • StoomBoot
  • StoomBoot interactive nodes

- Your Desktop (tier-4)

  • Self explanatory. For problems and questions with your local desktop contact helpdesk AT nikhef.nl
  • If the problem is LHCb-software-specific, though, you should contact GerhardRaven or RobLambert who will sort it for you.

- Local Gateway Machines (tier-4)

  • login.nikhef.nl, running a SLC5 platform with afs-support. For problems and questions with the login system contact helpdesk AT nikhef.nl
  • You are discoraged from performing any intensive tasks on the gateway machines themselves, since that can screw up everybody trying to reach the network.
  • parret.nikhef.nl, a shared SLC5 machine for the nikhef lhcb group. Consider this your go-to-machine once you're on the network.

- Access and submission (Ganga)

  • ssh into the machine
  • j.backend=Local()
  • or j.backend=Interactive()
  • j.submit()

- What local running is good for

Running locally is useful for:

  • Testing, you must test before submitting large numbers of grid jobs
  • Small and quick analyses of small files with fast turn-arounds
  • Visualization and graphical processes, really you need to run these as close as possible, ideally on the machine you are sitting in front of


StoomBoot (tier-3)

PBS

The StoomBoot cluster:

  • is a PBS (Portable Batch System)
  • covers around 200 cores
  • has the same NFS mounted directories as you would get on login.nikhef.nl and/or your local desktop.
  • has a CVMfs mount with the lhcb software, which can lead to much faster configuration of your jobs. See NikhefLocalSoftware
  • does not have AFS support (see below).

StoomBoot is good for running:

  • Jobs with no data at all (MC production, toy studies)
  • Jobs where the data cannot be/is not on the grid (Fitting procedures)
  • CPU-intensive tasks benefiting from parallelization

StoomBoot is not a replacement for the grid, and should not be used as such.

Please subscribe to the stbc-users mailing list for announcements and support.

-> Interactive nodes

There are 5 dedicated interactive node machines on StoomBoot.

  • These can be used to configure your jobs in the exact same environment that they will see on StoomBoot
  • These can be used in replacement of your desktop to run a session locally
  • stbc-i1 (SLC5)
  • stbc-i2 (SLC6)
  • stbc-i3 (SLC6)
  • stbc-i4 (SLC6)
  • stbc-32
  • from a machine already on the nikhef network: ssh stbc-... to get to your favourite StoomBoot node.

-> Interactive job submission

As well as using the interactive nodes, you can obtain an interactive session on a regular node.

  • qsub -I
  • from a machine already on the nikhef network: or your favourite interactive StoomBoot node.

-> command-line submission

From a directory somewhere under your home directory:

  • qsub <myscript> to submit to stoomboot, if the environment does not need to be passed to the worker nodes
  • qsub -V <myscript> to submit to stoomboot, passing the local environment. Useful if you're running LHCb jobs.
  • qstat to watch the status of your jobs
  • stdout and stderr returned to the local directory as <scriptname>.o<jobID> and <scriptname>.e<jobID> respectively
  • you can submit from any of the nikhef computing nodes, e.g. parret, or the StoomBoot interactive node.
  • there are different queues (select with qsub -q <queuename>); currently, there is express (jobs < 10 minutes), generic (jobs < 24 hours, this is what you get by default), short (jobs < 4 hours) and long (jobs < 2 days); finally there are legacy (jobs < 8 hours, on SLC5), stbcq and iolimited queues (jobs < 8 hours, on SLC6), all with access to the gluster file system; you can check for queue properties with qstat -Qf
  • jobs requiring multiple cores should only be submitted to the special multicore queue (jobs < 3 days). This requires you to be added to the list of users allowed to submit to it (contact Jeff Templon). The job script should be a PBS script stating explicitly how many nodes and cores the job will use (add e.g. #PBS -l nodes=1:ppn=8 at the top for 8 threads)

-> Access and submission (Ganga)

As a user you don't need to configure anything.

  • The local ganga configuration is managed by a central ganga ini file.
  • j.backend=PBS(queue="<queuename>")
  • j.submit()

Simple as that.

If you come across any problems with the environment, note the following:

  • by default we pass the local environment to StoomBoot using Ganga.
  • this vastly simplifies user configuration, since you need nothing in your .bashrc at all
  • If you do not need the environment, or if there is some problem with passing the environment, change your submit_str using .gangarc to
    [PBS]
    submit_str = "cd %s; qsub %s %s %s %s"
    
  • But don't just set this blindly, try to work out what the problem is first.

-> StoomBoot and AFS

StoomBoot does not have afs installed or mounted, this can cause some problems when you are using the interactive nodes... for example:

  • if your gangadir and/or cmtuser are a softlink to AFS, the softlinks will be overwritten with blank directories
  • if your ganga.py is a softlink it will be unable to be loaded at run-time
  • if you have afs directories on your pythonpath (for example for ganga utils) they will slow down configuration

So, if you do use afs from anything, better not to use the interactive nodes directly. Instead use a different desktop machine where Afs is installed, you can still submit to the StoomBoot cluster from any of the desktop nodes at nikhef.nl.

-> Monitoring and links


NIKHEF/SARA (tier-1&2)

Nikhef/Sara

The netherlands tier-1 grid site is located here.

  • The tier-1 site is subdivided into two (each effectively tier-2), NIKHEF/SARA, which together combine to make the netherlands tier-1.
  • The chosen mass storage technology is DPM.
  • There are thousands of cores and tens of petabites of storage shared between the grid community.

-> Access and submission (Ganga)

Apart from being physically closer to you, there is no difference between Nikhef and the other grid sites, so in general the policy is to submit to "the grid" so that you have access to even more machines.

It is possible to demand a given grid site using the Dirac() backend in Ganga, and it is possible to replicate files to the Nikhef DPM through:

  • dataset.replicate("NIKHEF-USER")
  • dataset.replicate("SARA-USER")
  • j.backend.settings['Destination'] = 'LCG.SARA.nl'
  • j.backend.settings['Destination'] = 'LCG.NIKHEF.nl'

Before using the Grid consider the hints and tips from the:

Storing things?


-- RobLambert - 24-Oct-2011

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng GridTest.png r1 manage 161.2 K 2011-10-24 - 15:48 RobLambert Grid resource diagram.
JPEGjpg logo_SARA40jaar.jpg r1 manage 10.3 K 2011-11-29 - 09:51 RobLambert The Sara logo
GIFgif pbs_logo.gif r1 manage 2.1 K 2011-11-29 - 11:01 RobLambert PBD logo
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2015-04-20 - PieterDavid
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback