Data Replica Tool

Introduction

This tool is intended to substitute better tools (PhEDEx) when moving:

  • A small quantity of files
  • Files not registered in Global DBS

The tool is written in python, and can work in various ways. It can query PhEDEx for getting replicas of a file, or more simply moves files from a specified site. It outputs:

  • Detailed user messages to stdoutput
  • Phedex-like statistics to a logfile, which can be parsed by PhEDEx Tools
  • List of successful, failed transfers
  • List of files with no replica in the Site list given

Main features:

  • Able to use PhEDEx dataservice to discover replicas
  • Supports SE white/black lists
  • Can delete files from incomplete transfers
  • LFN to PFN conversion performed automatically using PhEDEx dataservice
  • Complete logging
  • Support to CASTOR at CERN (CAF, users home directories)

The tool is available here:

https://raw.github.com/daniel-meister/data-replica/01-01-07/data_replica.py

Latest not-devel version is: V01-01-07 (updated on: 2013-02-04)

Please, before using it try it with the --dryrun option! (see below)

Supported cases

The tool can do copies:

  • SE to SE
  • local disk to SE
  • SE to local disk
  • Castor user dir @CERN to SE/local disk

It supports data discovery (Site, PFN) given a LFN list registered to PhEDEX data service, otherwise it only supports LFN2PFN translation

Usage

As given by -h option:


usage: data_replica.py [options] filelist.txt [dest_dir]

    This program will replicate a list of files from a site to another one using SRM

    filelist.txt is a text file containing a list of LFN you want to replicate to a site, one LFN per line

    dest_dir must be a complete PFN, eg file:///home/user/, if none it will be retrieved from the lfn
    information and the destination site

    you must at least declare [dest_dir] or the --to-site option

    Sites must have a standard name, e.g. T2_CH_CSCS

    Five log-files can be produced by this script:
       * <logfile>.log: contains a PhEDEx-style log
       * <logfile>_existingList.log: contains LFNs and PFNs of files existing on destination (SRM
         endpoints only)
       * <logfile>_failedList.log: a list of all the LFNs which failed
       * <logfile>_successList.log: a list of all the LFNs successfully copied
       * <logfile>_noReplica.log: a list of files with no replicas found (check DENIED_SITES list in the
         script header)


[USE CASES]

* Replicate a file list without specifying a source node (discovery). In this case, a source nodes list is retrieved from PhEDEx data service:
      data_replica.py --discovery --to-site YOUR_SITE filelist.txt

  * Replicate a file list using discovery and giving a destination folder:
      data_replica.py --discovery --to-site YOUR_SITE filelist.txt /store/user/leo

  * Replicate a file list NOT registered in PhEDEx. In this case, you should specify --from-site.
      data_replica.py --from-site FROM_SITE --to-site YOUR_SITE filelist.txt

  * Replicate a file list NOT registered in PhEDEx, giving a destination folder.Also in this case, you should specify --from-site.
      data_replica.py --from-site FROM_SITE --to-site YOUR_SITE filelist.txt /store/user/leo

  * Copying data locally: in this case you don't have to give the --to-site option but you need to give
  a dest_dir in PFN format. Warning: if you intend to use the --recreate-subdirs option, you need to create yourself the local directory structure:
      data_replica.py --from-site FROM_SITE filelist.txt  file:///`pwd`/

  * Copying data from a local area: the list of files should contain only full paths:
      data_replica.py --from-site LOCAL --to-site T3_CH_PSI filelist.txt /store/user/leo/test1

  * Copying files from CAF:
      data_replica.py --from-site T2_CH_CAF --to-site T3_CH_PSI filelist.txt /store/user/leo/testCastor4

  * Copying files from user area under CASTOR@CERN (files not registered to DBS). In this case, PFN are not retrievable from PhEDEx data service,
  so the file list must contain Castor full path (/castor/cern.ch/....) and the source site is CERN_CASTOR_USER:
      data_replica.py --from-site CERN_CASTOR_USER --to-site T3_CH_PSI filelist.txt /store/user/leo/testCastor3

  * When copying from a Castor area from lxplus and you want to pre-stage files to a local /tmp directory through rfcp
  (useful when copying files not accessed since long, avoiding srm timeouts), use --castor-stage.

  * Copying data from EOS@CERN, you have to specify --from-site CERN_EOS (this is automatically done for T2_CH_CAF, at the time of writing). Filenames in filelist.txt are still LFN (/store/...):
      data_replica.py --from-site CERN_EOS --to-site T3_CH_PSI filelist.txt /store/user/leo/testEos

  Use the -h option for more information



options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --logfile=LOGFILE     file for the phedex-like log, default is
                        data_replica.log
  --discovery           Retrieve data distribution from PhEDEx Data Service
  --from-site=FROM_SITE
                        Source site, eg: T2_CH_CSCS. If LOCAL is indicated,
                        the file list must be a list of global paths
  --to-site=TO_SITE     Destination file, eg: T2_CH_CSCS
  --recreate-subdirs    Recreate the full subdir tree
  --dryrun              Don not actually copy anything
  --debug               Verbose mode
  --copy-tool=TOOL      Selects the copy tool to be used (lcg-cp or srmcp). By
                        default lcg-cp is used
  --castor-stage        Enables staging of Castor files in a local tmp dir.
                        Works only on lxplus, and uses $TMPDIR as tmp dir.
  --delete              If file exists at destination and its size is
                        _smaller_ than the source one, delete it. WARNING:
                        destination files are checked only for SRM endpoints.
  --whitelist=WHITELIST
                        Sets up a comma-separated White-list (preferred
                        sites). Transfers will start from thse sites, then
                        data_replica will use the other sites found with the
                        --discovery option (without --discovery this option
                        makes no sense). Sites not included in the whitelist
                        will be not excluded: use --blacklist for this.
  --blacklist=BLACKLIST
                        Sets up a comma-separated Black-list (excluded sites).
                        Data_replica won't use these sites (without
                        --discovery this option makes no sense).

As a Python module

From V01-01-00, data_replica can be called as a python module. For example:

import data_replica

### passing options to data_replica
class drOptions:
    usePDS = False
    Replicate = False
    RECREATE_SUBDIRS = False
    CASTORSTAGE = False
    DEBUG = False
    TOOL='lcg-cp'
    DRYRUN = False
    pass

myOptions = drOptions()
myOptions.TO_SITE = "TO_SITE"
myOptions.logfile = "logfile"
myOptions.FROM_SITE = "FROM_SITE"
drExit = data_replica.data_replica([fileName], myOptions)
if drExit!=0:
    print "Some errors in copying"

Configuration

lcg-cp and srmcp options are specified in the python script. Defaults are:

lcg-cp < 1.7 :

 --timeout=6000  -n 1 

lcg-cp >= 1.7 :

  --srm-timeout=6000 --connect-timeout=6000  -n1

srmcp >= 1.7 :

-streams_num=1 -retry_num=1 -request_lifetime=6000 

The correct options for lcg-cp version are automatically selected by the script

By default, complete replica (replicating all the lfn structure, useful for retrieving sparse official files) is disabled through the parameter:

ENABLE_REPLICATION=False

No further parameters should be modified.

Program structure

CAVEAT: this section need to be updated

Main

The main routine is structured as:

  • Check file list
  • Select options for lcg-cp and srmcp
  • If Castor prestaging enabled, prestage all files through stager_get
  • Run over file list:
    • if discovery is enabled, retrieve site and pfn list through retrieve_siteAndPfn()
    • create the destination pfn through retrieve_pfn()
    • Copy files through copyFile(). Three cases are provided:
      • If copy is from Castor user directory: it performs a staging to local disk through castorStage() then the copy
      • If discovery is enabled, run copy trials on the site, pfn list
      • If discovery is disabled, simply tries the copy
    • Write the logfile through writeLog()

retrieve_siteAndPfn( lfn)

Given a string (lfn), it uses retrieve_siteList() and retrieve_pfn() to fill a structure of dictionaries (the returned value):

 
[{"node":node_value,"pfn":pfn_name}, {"node":node_value,"pfn":pfn_name}, ...]

retrieve_siteList( lfn, entry )

Given a string (lfn) and an empty array, it queries the PhEDEx dataservice and fills the array with entries like:

{"node", node_name}

It also arranges sources using the user preferences through arrange_sources()

arrange_sources(sitelist,PREFERRED_SITES )

A very simple ordering function, it puts the preferred sites on top of the list. The new list is the returned value.

retrieve_pfn( lfn, site )

Given a lfn and site, queries the PhEDEx dataservice to retrieve the pfn. It outputs an array of dictionaries with entries:

 {"pfn", value_of_pfn} 

castorStage(castor_pfn, myLog, logfile, tabLevel=2)

This function copies files to a local dir (/tmp or $TMDIR if set) with rfcp. It takes as arguments:

  • A Castor pfn
  • The Log dictionary for the copy process (myLog), where all the informations about the current transfer are stored
  • The log file
  • The number of \t characters to be put in the output (tabLevel)

It outputs the local pfn of the copied file and the exit status

copyFile(tool, copyOptions, source, dest, srm_prot, myLog, logfile, isStage)

This is the true core of the script. It fills the myLog dictionary with transfer information (start time, end time, size, etc...), stages files if requested through castorStage(), do the actual copy and prints out the result to stdout in a user-friendly format. It also writes a PhEDEx like log file through writePhedexLog(). After the copy is completed, it checks the copied file size with the predicted one.

It takes as arguments:

  • the copy tool
  • the copy options
  • the source pfn
  • the destination pfn
  • the srm protocol version
  • the log dictionary
  • the name of the log file
  • a bool value to perform local Castor staging

It outputs a bool about success state and the error log.

writePhedexLog(myLog, logfile)

This function simply takes the information in the myLog dictionary and write down a PhEDEx compliant log file. In this way, it can be read by standard PhEDEx monitoring scripts.

-- LeonardoSala - 17-Nov-2009

-- LeonardoSala - 08-Jan-2010

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2014-02-04 - DanielMeister
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback