4 June 2008, Revision 7

Brief Status of Atlas Production Cache 14.1.0.Y (14.1.0.3 candidate)

Andreu Pacheco-Pages / IFAE-CERN











Brief Status of Atlas Production Cache 14.1.0.Y (14.1.0.3 candidate)



Current situation: (23 May 2008 10h37)


Cache 14.1.0.2 is public since 21 May 2008. No date for next cache.

Proposed strategy: (23 May 2008 10h37)

Follow and create bugs in savannah for RTT, FCT and grid validation jobs.


Open Issues (9 June 2008)


ISSUE 080516-1025: Many bugs related with memory allocation problems. There is a high incidence of bugs opened due to crashes after failing to allocate memory. Batch systems usually limit the virtual memory of jobs exceeding 2.2-2.4 GB. This causes malloc() to fail when asking for more memory and then Athena crashes trying to use the returned pointer which is of course invalid. This will always happen, so it would be worth to handle the malloc failures in a way that the error can be easily identified to improve bug reporting.

Validation Bugs 14.1.0.2 (2 June 2008)


BUG#37835 (37414) Atlas Simulation (Atlas Validation) TRF_SEGVIO in valid task 22596.

  • Bug opened 4 June 2008
  • Partial failure of valid1.018101.PythiaB_Bd_Jpsie3e3K0s.digit.e339_s435
  • Bug verification script: /afs/cern.ch/user/p/pacheco/public/bvp-080613-2039.sh
BUG#37276: ATLAS Flavour Tagging.Btagging ERROR 14.1.0.2 TRF_UNKNOWN (Element [0][1] is not zero. Diagonal matrix was expected
  • Bug opened 1 June 2008
  • 5% (1/20) failures in task 22460 valid1.005403.SU3_jimmy_susy.atlfast.e322_a50
  • 7 June 2008: Could not access log file
BUG#37275 Atlas Validation 14.1.0.2 valid1 simu task 22504 failures: TRF_OUTFILE | Output file HITS.022504._00095.pool.root.3 not created. Argument
  • Bug opened 1 June 2008
  • 97% failures tasc 22504 valid1.000700.Cosmic_test.simul.v14010002
BUG#37263 Atlas Validation EXE_JOBDEF in task 22503
  • 31 May 2008. Bug opened.
  • 100% failures in task 22503 valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s429_r443
  •  Last comment on 3 June 2008 by Borut.
BUG#37257 (37232) Atlas Trigger TAPM (Atlas Validation) MinBias _EFID: InDetTrigTrackFitter measurement error is zero!
  • 30 May 2008: Bug opened
  • Task 22498 valid1.005568.ttbar_Pythia.bstream.e325_s404_d117_b36
  • Updated 2 June 2008. Bug tracked in #35769.
BUG#37237 Atlas Validation. TRF_UNKNOWN (Unable to commit output) in valid task 22468
  • Bug opened 30 May 2008
  • 100% failure task 22468 valid1.005640.CharybdisJimmy.atlfast.e322_a50
  • No person assigned.
BUG#37195 Atlas Validation:14.1.0.2 valid1 genfast task 22500 failures: TRF_EXC | ImportError: ('cannot import name SingleTopLeptonFilter',)
  • 29 May 2008: Bug opened
  • Task 22500 valid1.005504.tchan_McAtNlo_Jimmy.genfast.a52_tid022500
  • 2 June 2008: Status requested
BUG#36970 (36963) Atlas Reconstruction (Atlas Validation) 14.1.0.2 MuGirl segfault in reco from 13.0.40.5 BS
  • 26 May 2008. Bug opened by Iacopo
  • Bug assigned to Sofia Vallecorsa
  • 90% Failures Task 22362  valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s413_b30_r437
  • 2 June 2008: Requested status to Sofia.
BUG#36938 (36912): Atlas Muon Spectrometer (Atlas Validation)  crash in RpcROD _Decoder::fillCollection in ByteStream reading
  • 23 May 2008:Bug opened
  • 95% failures on task 22362 valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s413_b30_r437
  • In job 11717343 the crash is in the Muon Spectrometer code
    RpcPadByteStreamTool::ROBData_TI
    CollectionByteStreamCnv::RpcPadByteStreamTool

  • 24 May 2008: Bug moved to Atlas Muon Spectrometer. 
  • 26 May 2008: Iacopo raises an issue on this bug. Reconstruction of bytestream created with 13.0.40.5 has 95% failures
  • 26 May 2008: David Rousseau: Origin of the problem crash in muon BS reading
  • 2 June 2008: Believed to be fixed by tag TrigT1RPChardware-00-02-07-01 by Alessandro di Matia
MITIGATED BUG#36863 (36861): Atlas Trigger TAPM (Atlas Validation): 14.1.0.2 reconstruction: ~10 segfault out of 200 jobs (250 evt/job)
  • 5% Task valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s412_r431
  • Crash in RegSelSubDetector::getModules
  • 23May 2008: Submitted RegionSelector-03-02-31-01 to protect this method against crashes but problem was not there.
  • 2 June 2008: Requested closing the bug to Mark.

Validation bugs 14.1.0.1 (16b May 2008)

36729 (36663) Atlas Muon Spectrometer (Atlas Validation): 14.1.0.1 reco from 13.0.40.5 BS - Wrong reconstruction for staco muons

  • Bug Opened by Iacopo on 16 May 2008. Assigned to Alessandro Di Mattia
  • 100% impact on Task 22305 valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s413_b30_r423
  •  23 May 2008: Stefania Spagnolo reports she cannot get the file. Feedback given.
  • The problem seems not there if one reconstructs from BS produced in version 14
  • Last update 23 May 2008
  • 2 June 2008: Update requested to Alessandro.

36648 (36633) - Atlas Trigger (Atlas Validation) - csc_recoESD failure TRF_UNKNOWN,who=EFMissingET_Fex, message=Failed to attach feature!

  • Submitted by Andreu Pacheco on 16 May 2008.
  • 13% failures Task 22311 valid2.018101.PythiaB_Bd_Jpsie3e3K0s.recon.e315_s412_r421. Updated 16 May 2008
  • Moved to Atlas Trigger by David Quarrie and assigned to Diego Casadei on 16 May 2008.
  • 23 May 2008: Status requested.
  • 2 June 2008. Status requested.
BUG#36645: Atlas Validation: Task 22309 - a number of jobs fail with Output file not created, but this is G4Navigator error disguised
  • 100% failure Task 22309 valid1.008801.Hijing_PbPb_5p5TeV_MinBias.digit.e113_s432
  • TRFERROR TRF_OUTFILE Output file HITS.022309._00217.pool.root.1 not created. Argument outputHitsFile
  • 17 May 2008 Last update by Zachary Marshall 
  • 2 June 2008: Update requested to Zachary Marshall

36635 (36568,36567) – Atlas Simulation (Atlas Generators,Atlas Validation) – genAtlfast task – RTT Atlfast.5870.ttH_poslepnu_jj_bb failure. TRF_UNKNOWN, producer=csc_genAtlfast, message=POOL commit failed 0x9a769b0.

  • Bug opened by Andreu Pacheco on 14 May 2008.
  • Moved to Atlas Simulation and assigned to Simon Dean by DQ on May 16th, 2008

  • 23 May 2008: Status requested.
  • 2 June 2008: Status requested.
  • 5 June. The problem comes from the job trying to write the ntuple ParamOldCal.root in /afs/cern.ch/atlas/software/builds/AtlasAnalysis/14.1.0/InstallArea/share
36621 Atlas Validation: 14.1.0.1 - Configuration problem - No luck in reconstruction from 13.0.40.5 BS
  • 15 May 2008: Bug opened and no person assigned.
  • 100% failures on task 22307 valid1.005200.T1_McAtNlo_Jimmy.recon.e322_s413_b30_r428
  • Bug using two fragments ForceFullReco.py,MuGirlOff.py
  • 2 June 2008: Requested status to Iacopo.

36631 (36618) – Atlas Simulation (Atlas Validation) -TRF_SEGVIO, producer=csc_atlasG4, message=*** Break * segmentation violation
  • Bug submitted by Alessandra Doria on 15 May 2008.
  • Bug moved to Atlas Simulation on 15 May 2008.

  • 100% failures Task 22204 valid1.018101.PythiaB_Bd_Jpsie3e3K0s.digit.e337_s429. Updated 16 May 2008.

  • Assigned to Zachary Marshall by David Quarrie on 16 May 2008.
  • Zachary reports maybe duplicate bug with #35909 but keeps this bug open on 16 May 2008.
  • 23 May 2008: Status requested.

36574 (36480) – Atlas Simulation (Atlas Validation) – Simulation task - TRF_SEGVIO | * Break * segmentation violation. IOVSvc. WARNING setRange(CLID,key,range) for unregistered proxies is deprecated - you need to specify a store! This will be an ERROR soon! SystemError: problem in C++; program state has been reset
    • Updated May 15th. Assigned to Andrea di Simone.
    • 6% failures Task 22204 valid1.018101.PythiaB_Bd_Jpsie3e3K0s.digit.e337_s429

    • 23 May 2008: Requested update.
    • 2 June 2008. Requested status.

    36284 – Atlas Simulation – Bug opened by Manuel Gallas based on FCT failures - Geant4 got stuck in event in the 14.1.0.1 cache. Now filtered. Error details: ATH_G4_STUCK (15010) Geant4 got stuck in event. This error disappears if I change radius of fiber from 0.5557 to 0.556mm.

    • Last updated May 15th. Assigned to John Apostolakis

    • Brigitte epp copied all input files to lxplus on 19 May 2008.
    • 22 May 2008: Zachary Marshall reports how to reproduce the problem in lxplus.
    • 23 May 2008: Status requested.
    • 2 June 2008: Status requested

    34830 (36204) – Atlas Muon Spectrometer (Atlas Reconstruction) – Bug opened by D.Rousseau- if running ID+Muon MuidExtrCombinedMuonContainer cannot be written. To be followed with Stephane Willocq.

    • Last update May 14th. Assigned to Edward Moyse

    • David Quarrie requested confirmation that bug was fixed on 16 May 2008.
    • 23 May 2008: Closing requested.
    • 2 June 2008: Closing requested.

    35289 (36143) – Atlas Trigger TAPM (Atlas Physics Analysis) - 14.1.0 AOD->TAG trigger config is failing. GlobalTriggerTagBuilder errors using csc_buildTAG_trf.py in R14.0.0. , trigger config fail, thus trigger info is not filled in tag. To be followed with Ignacio Aracena.

    • Assigned to Joerg Stelzer. Last update 13 May. Priority was given to another bug #35211.

    • 23 May 2008: Status requested.
    • 2 June 2008: Tag jobs work fine in rel_1. Asked to verify if bug can be closed.
    • 5 June 2008. Problem fixed with tag EventTagUtils-00-02-05

Bugs from previous releases


BUG#35769: Atlas Tracking. Uncaught error in R14.0.0.1 reconstruction task 21398 (InDetTrackFitter)

  • Bug opened 18 April 2008 by Wouter
  • Same bug in release 14.1.0.2, see bug #37257.
  • Assigned to Tomasso Lari
  • 4 June 2008: Status requested

PROCEDURES TO BE FOLLOWED (14 May 2008, 14:51)

  1. Look at RTT tests called Event,Simu or Reco JobTransforms,open and follow up tickets.

  2. Look at FCT tests daily

  3. Recommended that tags go into AtlasPoint1 first and then in AtlasProduction with the exception of Simulation tags.

  4. Follow up bugs and check daily their status.

  5. On Tuesday before 12am fill the Atlas Software Validation report for Job Transforms.

  6. Ignacio Aracena will advise the acceptance of Trigger tags into Atlas Production since May 14th 2008

  7. In bug reports specify failure rate whenever possible

  8. Expert in Atlfast I - Simon Dean

  9. Expert in Atlfast II - Michael Duehrsenn

  10. Technical changes to FCT must be sent to atlas-project-fullchaintest-technical@cern.ch


-- AndresPacheco - 21 May 2008

Edit | Attach | Watch | Print version | History: r31 < r30 < r29 < r28 < r27 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r31 - 2008-06-14 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback