Progress Report, 2 February 2011

Infrastructure Status

  • Integration sites: UCSD, FNAL, Purdue, Wisconsin
    • UCSD passing all monitoring.
    • Purdue passes heartbeat, but not redirector tests (needs TFC change).
    • Wisconsin has problematic dCache servers. Mostly passes heartbeat. Fails redirector tests; needs TFC change.
    • FNAL passes neither heartbeat nor redirector tests.
  • Production sites: Caltech, Nebraska
    • Nebraska passes all monitoring.
    • Caltech passes heartbeat, but not redirector. Needs TFC change.

Action Item Status

Action items and progress from last week:

  1. Use Monalisa as a monitoring system: the repository / web interface will run at UCSD, got machine for that on Feb 9. Have working session with one of the developers (Ramiro) on Feb 10.
  2. Writeup deliverables and milestones: XrootdUscmsTimeline
  3. Start maintenance of UCSD xrootd/hdfs system: Done - but there seem to be issues with hadoop at UCSD, very long delays in file access.
  4. Start work in converting a physicist's analysis to use Xrootd: Started, but we need to break down this into 2-week-sized chunks. [MT comment: well, I just got code from Ben.]
  5. Service monitoring: Using Nebraska instance for now. Have heartbeat and redirector-based monitoring, as described here. We still need random file monitoring, JobRobot monitoring, and alerts.

Other items not from last meeting:

  1. Writeup of CMSSW I/O needs: CmsRootIoIssues.
    • Brian (maybe Matevz?) will likely spend some serious time investigating the first 5 issues. Issue 1 is a concern for this project.
  2. Checklist for sites: XrootdProductionChecklist.
  3. Writeup of development items needed from Xrootd team: CmsRootIoIssues.

Action Items for Next Two Weeks

  1. UCSD UAF cluster. [MT comment: I thought Alja will be free after this week to help me on that but we just loaded a bunch of Fireworks stuff on her. And partially on me, too.]
  2. Improved service monitoring (missing tests, alerts).
  3. Clarify plans for JobRobot with Andrea Sciaba.
  4. Fix dcap deadlock issues (Brian).
  5. CMSSW TTreeCache management for 4_2_0 (Brian).
  6. Upgrade release to 3.0.2; test cmsd throttling from Andy.
  7. Update project webpages: remove references to demonstrator, add information about architecture we're working on deploying.
  8. Continue Monalisa monitoring investigation.
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2011-02-09 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback