CMS I/O Work, the Second

This round of I/O work has the following goals:

  1. Remove uncached I/O usage.
  2. Make CMSSW I/O latency friendly (goal is analysis runs above 70% CPU efficiency on a 100ms latency link).
It should take about 15 minutes of compiling to get these patches working.

CMSSW Changes

Cache non-Event trees

At the end of the last round of work, we still had 1-2 reads per event (the EventMetaData tree and/or the EventHistory tree). So, even though the large majority of reads have gone away, we can eliminate these and do better.

Two Cache Scheme

After we cache the majority of the reads from the Event tree, a significant overhead is the initial 20-event training period when ROOT attempts to determine the list of active branches. During this training period, ROOT I/O behaves in its normal uncached manner. We have found that this training period can take up a large percentage of the I/O time, especially when all other reads are cached.

To fix this, we introduce a second cache; we call this cache the "raw cache" and the original cache the "refined cache". The "raw cache" is activated for the first event, and caches all baskets and all branches for the first 20 events. During the next 20 events, we will continue to train the "refined cache", but no I/O requests are done using the refined cache - all requests are pulled from the "raw cache". The raw cache causes a slight over-read, but these over-reads are done on only 20 events, and greatly reduce the number of I/O interactions between CMSSW and the storage element. The net effect is a decrease in total I/O time.

Software Patches

Before you apply any patches, make sure you have your CMS environment set up correctly:

cmsenv
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW

Fixing TTreeCache in ROOT

There is an off-by-one bug in TTreeCache in ROOT 5.22 that causes some of the required blocks to not be properly prefetched. This causes a higher level of cache misses than there should be.

NOTE: Do not apply this to ROOT 5.26.

The linked file fixes this issue. However, this requires you to rebuild ROOT's libTree.so and add it to your environment. This is not covered here, and may be out of reach for a typical user. The following may get you started:

svn co http://root.cern.ch/svn/root/tags/v5-22-00h root-v5-22-00h
svn switch http://root.cern.ch/svn/root/branches/v5-22-00-patches/io/io/src/TFile.cxx io/io/src/TFile.cxx
svn switch http://root.cern.ch/svn/root/branches/v5-22-00-patches/tree/tree/src/TTreeCloner.cxx tree/tree/src/TTreeCloner.cxx
cd root-v5-22-00h
sudo ln -s /usr/lib/libXpm.so.4 /usr/lib/libXpm.so # You may need to get your sysadmin to do this; only do it if it does not exist
if [ $SCRAM_ARCH = slc5_ia32_gcc434 ]; then setarch i386; fi
./configure
/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork2/root_522_ttc2.patch | patch -p0
make lib/libTree.so lib/libRIO.so lib/libNetx.so
if [ $SCRAM_ARCH = slc5_ia32_gcc434 ]; then logout; fi

Copy (or link) the resulting libraries to CMSSW_3_5_4/lib/slc5_ia32_gcc434. NOTE: You may need to change the links below to match your SCRAM_ARCH

ln -s $PWD/lib/libTree.so $CMSSW_BASE/lib/$SCRAM_ARCH/
ln -s $PWD/lib/libRIO.so $CMSSW_BASE/lib/$SCRAM_ARCH/
ln -s $PWD/lib/libXrdClient.so $CMSSW_BASE/lib/$SCRAM_ARCH/
ln -s $PWD/lib/libNetx.so $CMSSW_BASE/

CMSSW Patches

The below patch does two things:

  1. Implements the 2-cache schema described above.
  2. Caching the EventMetaData and EventHistory Trees

The linked file is a patch against the IOPool/Input module.

Prior to CMSSW_3_7_0_pre4:

addpkg IOPool/Input
addpkg IOPool/TFileAdaptor
addpkg Utilities/RFIOAdaptor
addpkg Utilities/StorageFactory
pushd $PWD
cd src/IOPool/Input
cvs up -r V10-11-00
popd
pushd $PWD
cd src/IOPool/TFileAdaptor
cvs up -r V04-00-11
popd
pushd $PWD
cd src/Utilities/RFIOAdaptor
cvs up -r V04-00-06 
popd
pushd $PWD
cd src/Utilities/StorageFactory
cvs up -r V04-00-13
popd
/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork2/cmssw_2cache.patch | patch -p0
scram b -j4 USER_CXXFLAGS="-g"
For CMSSW_3_7_0_pre4 and afterward:
addpkg IOPool/Input
/usr/bin/curl -k https://twiki.cern.ch/twiki/pub/Main/CmsIOWork2/cmssw_2cache.patch | patch -p0
scram b -j4 USER_CXXFLAGS="-g"

Note that I enable debugging symbols - this way, if you encounter issues, I get line numbers and file names when you send me an angry email

Nothing needs to be done to the CMSSW config file if you already have the file adaptor enabled.

Seeing what's happening

Unfortunately, using xrootd turns off all the interesting file system statistics that CMSSW collects. If you're interested in seeing every last thing the Xrootd client does, do this:

echo "XNet.Debug 2" >> ~/.rootrc

To see just the summary statistics, instead of everything,

echo "XNet.Debug 1" >> ~/.rootrc

Results

Results are kept in the instrumenting pages, here.

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpatch cmssw_2cache.patch r1 manage 18.2 K 2010-03-30 - 22:08 BrianBockelman  
Unknown file formatpatch root_522_ttc.patch r1 manage 1.7 K 2010-03-30 - 21:04 BrianBockelman  
Unknown file formatpatch root_522_ttc2.patch r1 manage 2.9 K 2010-05-13 - 21:59 BrianBockelman  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2010-05-18 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback