ATLAS Software Workshop Minutes

CERN, March 15 - 19, 1999

Day Time    
Mon 09:00 Topic D. Düllmann: Object data bases as data stores for HEP
References Slides: HTML, gzipped PS, PDF, PPT
Summary
  • Overview over data handling requirements of LHC experiments
  • Total data volume of LHC experiments: ~ 100 PB
  • Very heterogeneous, distributed environment for analysis
  • Existing solutions won't scale. RD45 proposes to use a commercial ODBMS coupled to a mass storage system. Current choices: Objectivity/DB and HPSS. Mass storage system supposed to be transparent to the user
  • HEP data models are very complex, many relations between quantities, different access patterns depending on phase of data handling (reconstruction, analysis, ...)
  • Object data base matches best with object oriented production programs
  • Two options: loose binding between in-memory and on-disk (explicit copying); tight binding (use persistent objects directly in code - no explicit store and retrieve commands). The latter is the more natural choice with OO programs and data bases. The same objects can be accessed from different programming languages (C++, Java, Smalltalk) if common subset of features is used
  • State of objects preserved from one access to the data base to the next
  • Navigational access: via a unique Object Identifier (OID), which is a natural extension of the pointer concept. Reflect "physical" object properties in the case of Objectivity. Can be used for uni- or bidirectional associations
  • Access to objects via 'smart pointers' which look very much like normal pointers, but take care of the necessary I/O automatically
  • Iteration over all events very easily possible, little code. User does not need to know about physical locations within a federation, this is being taken care of by OIDs
  • Object clustering: Goal is to transfer only 'useful' data. For example, one could keep all tracking data together on pages. But access patterns might change over time. Caching would be one way of addressing this problem
  • Physical model: one federation, several data bases with containers. Logical view: one entry point
  • Physical arrangement can be changed without impact on the applications (except perhaps on performance)
  • Concurrent access: support for many concurrent writers - multiple readers, one writer per container in the case of Objy/DB. Locks refer to objects, but act on containers
  • Data changes are part of a transaction - assures integrity of the data base
  • Objectivity/DB architecture: OID size is 8 bytes, 64 k data bases, 32 k containers per data base, 64 k logical pages per container (4 GB containers for 64 kB page size, 0.5 GB for 8 kB). 64 k object slots per page. Theoretical limit: 10'000 PB, assuming 128 TB files for the data bases. Assuming database files of 100 GB, total capacity would be 6.5 PB. Extensions to Objy/DB have been requested
  • Scalability tests: 1 TB demonstrated by Atlas, multiple federations of 20...80 GB are used in production. 145 MB/s (80% of disk speed) seen from 240 parallel clients at the Caltech Exemplar machine. Many more scalability tests (overflow conditions etc) went fine, hence federations of a few 100 TB seem feasible right now
  • Distributed federations: Application, Objectivity page server, lock server, HPSS client, HPSS server can all be on different physical machines
  • Data replication: at the data base level, objects in replicated DB exists in all replicas, copies are kept in sync by the data base. Provides for enhanced performance and availability. In case of some replicas not being reachable, a quorum mechanism is applied for writing
  • Schema evolution: Evolve object model over the experiment lifetime. Support comprises change of inheritance hierarchy. Migration of existing objects can be done immediately (whole database at once) or in 'lazy' mode (migration as objects are accessed for writing)
  • Object versioning: multiple versions of a single logical object, supporting branching and re-merging of branches
  • Other commercial ODBMS products: Versant (for Unix and Windows, scalable and distributed, schemas kept locally, very difficult economic situation); O2 (for Unix and Windows, incomplete heterogeneity support, bought by Unidata and merged with Vmark); ObjectStore (for Unix and Windows, scalability problems, proprietary compile, kernel extensions, now focussed at Web applications); POET (mainly for Windows, low end, scalability problems). Need to observe what the big object relational vendors will offer
  • ODBMS: ship data to the client; ORB: ship request to the server, very different performance implications; the two are largely complementary
  • ODMG standard: standardised interface (data definition language, data interchange format, language bindings for C++, Java, Smalltalk); still, any large scale migration from one ODBMS product to another would require a significant effort
  • Current projects using Objy/DB: BaBar (to start data taking in May, 200 TB/y), Zeus (for event selection in the analysis phase), AMS, NA45, Chorus, Compass
Discussion
  • Q: Is there a third way? What if one wants the objects on disk look different from the ones in memory? A: This is more difficult, it basically means introducing another layer in between in-memory as seen by users, and on-disk as seen by the data base. Also note that updating an in-memory object directly gets propagated to the on-disk version
  • Q: What about changing and enlarging objects? A: That is easily possible within the current application, as the page identifier is a logical one
  • Q: Can multiple federations be accessed at the same time? A: This can currently only be done sequentially - one federation must have been closed before connecting to another one
  • Q: What happens if some data (some databases) are not currently accessible? A: Program will stop at this point, because the file server cannot be reached. This can be caught by a signal handler at user level
  • Q: What if people want to do some event selection on the general store, and take a file with selected data home? A: A private tag data base would be a solution. This would be written within the global federation, and then be imported into the private federation. LHC++ will provide the necessary infrastructure
  • Q: Why don't we go for our own implementation of an ODBMS? A: The effort would be very significant, but given the unhealthy situation on the commercial market, this option may need to be investigated further
Tue 09:00 Topic J. Knobloch: Introduction, workshop agenda
References Workshop agenda
List of participants
Summary
  • Highlights and general structure of agenda
Tue 09:10 Topic T. Akesson: Computing review: The Atlas management view
References Schedule of actions: PDF
Summary (kindly provided by the speaker)

T Akesson presented the current view of the ATLAS management. It is in general foreseen to ackomodate the review conclusions but with some variations, and with stronger emphasis on some issues.

The manpower is a central issue and ATLAS need a structure which is prepared to receive efforts as serious committments. This requires work to be assigned in a rather formal way so local support can be seeked with solid justification. Emphasis is therefore put on committments to s/w effort.

The detector specific s/w is suggested to be organised in the detector systems while ATLAS general s/w will be followed by a s/w project leader reporting to the computing coordinator.

The call for detector system s/w coordinators, and dBase, reconstruction and simulation taskleaders has taken place. The call for nominations of the next physics and computing coordinators has also been launched. Contacts have also started to set up an ATLAS wide network of national training contacts.

Next steps will be discussed in the April EB. It will probably include the launch of an architecture taskforce and a quality control group. Results from both will be needed to allow the system and general s/w to be partitioned in pieces that groups can take responsibility for. These groups have to have a balanced composition spanning many activities including both s/w and detector expertise, to ensure an anchored outcome.

Another foreseen ingredient is a National Board divided according to funding agencies, and with a regional center working group. This board will be the main body for issues like platforms, networking, specifying needs for collaborative tools etc. The formation of such a group will require CB endorsement.

The foreseen action plan contain a lot of parallelism to get a re-organisation in place as soon as possible.

Finally it was emphasized that the ATLAS baseline is C++/OO for programming and Objectivity as database. This does not mean that it is decided that ATLAS will have Objectivity as database year 2005, but that the developments and tests that are made now should be in this baseline.

Discussion
  • Q: Why are we pretending that issues such as quality assurance and architecture have to start from scratch? There have been significant efforts made already. A: Yes, they need to be taken into account, but the review has asked for significant, or even radical, changes
  • Q: How is the discussion supposed to go on? Is the proposal already casted in stone? A: Comments (to Peter Jenni, Torsten Akesson are always welcome, but the proposal is probably close to the final implementation, and soon, decisions will need to be taken
Tue 10:20 Topic J. Knobloch: Computing review: Report from ACOS
References Slides: HTML, gzipped PS, PDF, PPT
Summary
  • ACOS met on 19 February; main topic: review report
  • Reports from domains: ID, LAr, Tiles, Muon, Trigger, Database, Detector description, Calibration and alignment, Analysis tools
  • Computing review: Endorses general strategy, suggests improvements on organisation, software process, and regional centres
  • P. Jenni and T. Akesson: Atlas computing is not on the right track, important changes needed, action plan for EB on 09/04/99. Much more involvement of system experts needed
  • Organisational structure: Discontinue DIG and ACOS, establish a Computing Steering Group instead, taking most of the DIG tasks, consisting of Computing, Physics, Database, Simulation, Reconstruction, Event filter coordinators, the management and three more members
  • Round table on organisation: Training is important; another meeting with formal links from countries; formal commitment of institutes required; assemble tasks in work packages; address also the maintenance of the Fortran software; core group is important; schism overemphasised in report, mostly psychological, prolonged by physics TDR; should be overcome from both sides; ID does not see this schism
  • Organigram: Too many links; interfaces ill defined; needs explanation in a separate document; proposed organisation could even widen the gap; need two bodies, one for policy, one for technical issues; precise mandate needs to be defined for the physics and the computing coordinator
  • ASP: Astonishment about proposal to dismiss software process; quality control is important; rules are not questioned, but are not sufficient; work going on to make ASP more gentle; design and code reviews could be part of the learning process; process needs to be communicated better; have to reduce the threshold; StP considered too difficult; documentation to be simplified
  • Regional centres and Monarc: Avoid duplication of work between Monarc and ATLAS
Discussion
  • In some cases in the past, split between reconstruction and simulation at high level has been beneficial
  • A member of the review committee remarked that the statements concerning the endorsal of the main strategic choices in the committee report are, in his opinion, not correct.
  • Main shortcoming so far was that from the management, there were incoherent signals sent to the community. Also in terms of the system architecture, strong leadership either by one chief architect, or by a group of competent people, is required
  • A high-level split between simulation and reconstruction can work provided the interfaces are properly defined
  • Of the deliverables foreseen in the ASP, the requirements collection has worked least well
  • For detector description, some necessary communication with the systems has not taken place yet
  • For the nominations of the physics and computing coordinators, care will be taken on the ability of the two to collaborate
  • The management proposal is not yet in a state where it can be widely published
  • The functions and the responsabilities of the physics and the computing coordinator need to be clearly defined
  • The reasons leading to the choices of the current organisation must be carefully considered as well
  • T. Akesson: All effort should be made to ensure that all Atlas is pulling in the same direction
Tue 11:05 Topic J. Allison: Geant4 software process
References Slides: HTML
Paper for CERN School of Computing 1997: PS
Summary
  • Geant4 started as RD44 in 1995. Easy to write a requirements document
  • Aim is to provide code which (together with the documentation) is readable by end-users
  • Now migrating from research and development project to a production phase, organisation similar to a large experiment
  • Next production release in May, mostly consolidation, not many new features
  • Requirements, design, implementation, evaluation has been an iterative process
  • Design tool: Rational Rose, not used for coding, drawings depict high-level design
  • Language choice has had some impact on the design. Minimum of coding guidelines used (all classes start with G4, methods start with a capital letter, ....)
  • Important to design categories with loose coupling and well defined interfaces. No circular dependencies in the category diagram, leading to very small number of initial circular code dependencies
  • Used abstract interfaces, avoided 'casting'
  • Category coordinators given much independence
  • Code review considered essential, requirements were presented to the community, designs first reviewed within category, then brought to global G4 workshops, without any formal procedure
  • Cvs considered very important, but no software release tools used
  • Data kept in static arrays, cvs maintained files, or external (tar) files
  • Lessons learned: Be prepared for iterative re-design; modularise, couple loosely; plan for environments; Early test procedures, bug tracking, code review procedures; versioning of binaries; exception handling; limit size of executables
Discussion
  • Q: Did the code review point out problems of coding or rather problems of design? A: Both, with some emphasis on coding
  • Q: What is the experience with the different software processes in G4 and in BaBar by people working on both? A: There is communication about the software process. Geant4 has not been integrated into the BaBar software, but is considered external
  • Q: Were there problems with missed milestones? A: Yes, in that case the targets have been rescheduled. In no case has Fortran software been wrapped
  • Q: Would you still choose C++ if the choice had to be taken today? A: Not sure, but clearly C++ is still perfectly adequate. In particular, the excellent tool support needs to be considered
  • Q: Did it happen that responsibility for a major piece had to be transferred to another person? A: Yes, frequently, and it went quite smoothly. This benefitted from the good design documentation and the readable code
  • Q: Did you have the case of code which was that bad that it needed to be rejected? A: Yes
Tue 11:35 Topic B. Jones: Back-end DAQ software process
References Software Process in the Atlas Back-end DAQ (Oct 97): PS, PDF
Back-end DAQ software process (Feb 99): PS, PDF
Summary
  • Only refers to a specific subsystem of DAQ
  • Goal was to provide a prototype "-1" of the DAQ; subsystems: detector interface, data flow, event filter, back-end software
  • Strong interfacing with all other parts of the DAQ
  • Software process: put in place by concerned developers, adapted from textbooks with timescales, applicability, scope, experience of developers in mind
  • Many similarities with the ASP, even increasing with time; components roughly map to domains
  • Differences: Smaller, much better integrated community; software process described in a couple of "how-to" Web pages
  • Inspections are more human, more detail on testing procedures; so far, all back-end software has followed the process, which has become a little more formal with time
  • Continuous communication with ASP authors
  • Phases: requirements collection; document was reviewed and used as a guideline to decompose the project into components
  • Pre-design investigation phase: decision on the implementation language etc: C++; written down in technical notes, reviewed in back-end DAQ meetings
  • High-level design: by far not as detailed as what is foreseen in the ASP, mostly textual, few diagrams picked from StP. There is no single complete module in StP. High-level design done by 5 small groups. Reviewed in meetings, quite some designs went through a few iterations
  • Detailed design and implementation: many deliverables such as code, implementation note, users' guide, test plan (originally foreseen at high-level design, but didn't work there)
  • Testing and integration
  • Work organised much around components, max. 4 developers per component, one coordinator per component. Prefer one institute per component. Additional efforts spent on identifying commonalities between components
  • Components developed according to agreed priority
  • Important external packages
  • Inspections: Introduced only during the process, managed by one identified, trained person, based on Tom Gilb's software inspection method. Different check-lists depending on the deliverable being reviewed. Focus is on identifying problems, not solving them. "Real" meetings are much preferred over "virtual" ones. Excellent way of training newcomers. Best to have people start as reviewers before they act as authors. But: Inspections are a lot of work! Code inspections require full documentation, compliance with coding rules, integration into SRT, positive outcome of testing tools. Important faults found this way, hence it is worth the effort
  • Division into phases helped organise and schedule the work; organisation chosen to be as small as possible; adoption of OMT was very important for communication and documentation
  • Now 11 components, 180 k LOC, partly ready, concentrating now on regular incremental releases
  • "Do's": start gently; keep it simple; inspect all deliverables; provide templates, checklists, examples; get a non-author for component testing. "Don'ts": burden developers unnecessarily; ask developers to do something which has not been tried before; underestimate time and effort required for software management, integration and testing; do distributed development if avoidable
Discussion
  • Q: Has the metrics in Logiscope been used? A: Yes, was found useful. Strong support from IT/IPT and Lassi Tuura
  • We should try and learn more about the DAQ testing
  • Q: Were there problems with developers writing code with dirty C tricks? A: Yes, it has been dealt with on a case-by-case basis. In some cases, it was well justified because of hardware interfaces
  • Q: What about inspections which come up a second time? A: Did not yet happen during the more formal procedures. Informally, it happened, and it was considered one and the same inspection process
  • Q: How much are the developers available? A: Only 1 person full-time, all others part-time, but if it's not 50% at least, it is not worth it
  • Decomposition is difficult if the technology is still not clear to all persons involved
  • Q: How important is it that the managers are themselves contributing to the development? A: That depends on the size of the project, but some feedback is very important. On the other hand, managerial issues must not be underestimated
  • Q: Has any code been thrown away? A: Yes, all the evaluation software. Other parts have been changed very substantially
Wed 09:00 Topic H. Meinhard: Report from DIG meeting
References Slides: HTML, PS, PDF, PPT
Summary
  • Computing review: short discussion; surprised to learn about implementation steps without consultation with the people affected
  • Reviews: Wired, XML parser: partial positive feedback, awaiting more evaluation reports. Graphics code, muon code: awaiting more reviewers' reports; many of the comments received so far are on design, not on code. SRT documentation: being completed. Magnetic field design (rather tracking in magnetic field): New design report wrill be written because of too numerous substantial comments. IPatRec design: completed; review resulted in smaller and better separated modules. Walkthrough of muon code on Thursday this week
  • Global architecture document: DIG working group to write up all our architectural design choices made so far. Will start with scenarios and the document about the 'how' (what technology will we use to implement the system), and then (with strong involvement of systems and physics groups) the 'what' (what simulation, reconstruction, physics... are we going to do). Document should be understandable by all developers. Detailed outline of 'how' document being prepared now
  • Round of domains: Graphics, data base, reconstruction: see reports later this week. Muons: concentrating on physics TDR, Muonbox in cvs. Magnetic field: ported to more platforms. Analysis tools: discussion of requirements in AWWC meeting. TileCal: setting up for test beam: Objy, analysis tools; meeting later this week. ID: Common clustering progressing. Reconstruction: Track class being prepared, first draft next week, discussions with the detector communities foreseen
  • Tools: Bonsai (sophisticated Web cvs browser) working fine, need to revise requirements for Light; UML conversion of StP systems ongoing, new systems should use UML; review of design tools to be pushed, likely that more than one tool will be recommended
  • Platforms: Only few requests for Windows NT, no offer to help; hence, recommend to suspend support
  • AOB: Dig recommends to hold May workshop at CERN, decide August workshop still this week
Discussion
  • Architecture must define the interfaces between domains as well
  • Not clear that people prefer an outside workshop; anyway, a final decision about the August workshop must be taken by the end of this week
Wed 09:30 Topic G. Poulard: Status of TDR software and productions
References Slides: HTML, PS, PDF, PPT
Summary
  • Geometry description: ID and calos stable, new tile digitisation implemented; muon: recently changed, still based on CMZ version
  • Reconstruction: iPatRec: much work for combined reconstruction; muons: Muonbox being actively developed
  • Combined reconstruction: e/gamma, conversions available; muon identification: combination of muonbox and iPatRec (good at high pt, but poor efficiency at low pt); new statistical approach combining covariance matrices results in improved low pt efficiency
  • Other work (private basis): e/gamma id, conversions, soft e, muon id, soft muons, tilecall cells, primary vertex, vertex b-tag, overall b-tag
  • Combined n-tuple provided in atrecon
  • All code in cvs, dice not fully tested yet, hence is not yet an official release. Main programs now in 'Applications' domain. Not yet on all platforms
  • Production: For simulation, see Web page; reconstruction not centrally organised, mostly done on private basis, list is probably incomplete
  • Next: evolution of the geometry (is it needed, time scale, reference version for G4); evolution of the reconstruction, comparison in 2000, analysis in new framework of produced data; maintenance of Fortran software is a major enterprise, requiring significant resources
Discussion
  • Care must be taken to always have a running system, in particular because of the test beam activities
  • Q: Have numbers been obtained already for the CPU requirements of the reconstruction so far? A: No final numbers yet, but they will be provided with the physics TDR
  • Q: Which version of AIX has been used? A: AIX 4.1. Version 4.3 gives problems in the linking step
  • Q: What is the status of Linux? A: The simulation works fine, there are minor problems with the reconstruction which are hoped to be solved within the next few weeks
Wed 09:50 Topic M. Stavrianakou: Repository and releases
References Slides: HTML, PS, PDF, PPT
Operating systems and compilers for Cernlib 98, Cernlib 99, LHC++ 98a, LHC++ 99a
Summary
  • TDR software: 25 top level packages, domain software: 16 top level packages. 480 kLOC in F77, 98 kLOC in Age, 269 kLOC in C++ (30% increase over December 98). In release: 40 top level packages
  • Platforms: HP, DEC, IBM, Linux, Sun; not (fully) supported: SGI, WNT
  • Fortnightly developer releases; aim at weekly releases?
  • Nightly builds: from the head versions of the packages. Feedback to developers still to be improved
  • Release frequency and procedure: simplification and speed up needed: pragmatic decisions on supported platforms; improve package structures and dependencies; use nightly builds for early debugging; ease developer's job; release some packages independently or using binaries from previous releases; local disks or dedicated machines; building on fastest platforms first; work incrementally at preparation stage; share partial or full support for some platforms with other institutes
  • Tools and QA: cvsweb, Bonsai available; dependency grapher; CodeCheck, Insure++, Purify, Logiscope. Clear policy needed on who runs what when
  • Spider SRT, Collaboratory tools
Discussion
  • The list of supported compilers ought to be on the Web
  • Q: Are we in sync with Cernlib and LHC++? A: Yes, after the next LHC++ release
  • Q: Is it possible to have both optimised and debug releases? A: That would mean a lot of work and additional resource consumption
Wed 10:40 Topic Discussion about Windows NT support
Summary
  • There is not much demand for NT support. SRT etc. work, but NT would require commitment of many developers
  • The last questionnaire resulted in 5 institutes asking for NT support, and no one offering help (even after personal contacts)
  • One aspect to consider is the different graphics interface of NT
  • Obviously, NT support would take valuable resources; our prime goal should be to have software developed
  • NT is widely used in industry, we shouldn't write it off too early
  • NT support must be understood to comprise support for native tools such as Visual Studio, which means more work than for an additional Unix platform
  • The Event Filter requirements need to be considered. However, the EF is not committed to NT either. They don't request NT versions of the reconstruction code
  • When discussing NT support, extreme positions (such as one guy cleans up the mess of all developers, or all developers must make sure their code runs under NT) are not helpful. Some reasonable compromise would need to be found
  • Important that we do not preclude now a later move to NT. This implies that the developers should be given guidelines what to do and what to avoid to remain compatible with NT. Of course, this would not help for graphical applications
  • Visual Studio is a nice development environment that many people like. An option would be to support it without building releases regularly on NT
  • Events external to Atlas (eg. Spider/SRT providing good NT support) could trigger us reconsider the situation
  • NT has proven to be excellent for finding bugs in code; are we prepared to give this up?
  • As long as we access Fortran code from C++, there is a problem on NT with Character arguments
  • Part of the lousy reputation of NT may be due to the NICE NT installation at CERN; why not set up our own NT cluster in building 40?
  • NT is popular due to - among other things - Visual Studio; however, similar tools exist under Unix and should be evaluated
  • There is now an emulator of PC hardware available which allows to run NT (and 98) under Linux on a X86 PC
Decision
  • For the time being, no releases will be provided on NT, and the project files necessary for Visual Studio will not be provided
  • Developers are not required to ensure their packages run on NT
  • A set of guidelines will be developed in order to allow for a later move to NT
  • The decision will be revised if a volunteer offers to take care of NT support
Wed 11:30 Topic J. Knobloch: End 99 computing status report
References Slides: HTML, PS, PDF, PPT
Summary
  • Requested by LHCC referees
  • Scope not yet defined
  • Subject to consequences from computing review
  • Major topics: Project management (organisation, manpower, management tools, revised software process, milestones, critical items, risk analysis); software (architecture, framework, data base, reconstruction, simulation, graphics, analysis tools); computing model (technology tracking, Monarc, analysis strategy, regional centres, central installation, cost update); remote communication and collaboration; software development environment; training
Discussion
  • Work could be simplified by asking the referees what they really want to be discussed
  • It is planned to split the document into pieces, and to assign various editors to them
Wed 11:40 Topic J. de Jonghe, M. Angberg: Project management and supporting tools
References Slides: HTML, gzipped PS, PDF, PPT
Summary
  • Originally developed for hardware projects, not too different from software execution plan proposed two years ago
  • Advantages: automatic request for reporting; uniform reports available in central location
  • Input required: work packages (not trivial to define)
  • Concepts: work package, progress report, comment to progress report
  • Workflow: Manager gets alerted until he produced a report, of which the project leader gets informed and can comment
  • Integration with MS Excel, MS Project, being improved. Excel and Project not required, though
  • Demonstration of Web interface: Various views on progress reports and work packages with powerful navigation tools. Easy modification and creation of work packages
  • Work packages assigned to PBS
  • Next version: graphical display of cost profiles, GANTT charts
  • Various ways of customisation for software work feasible
Discussion
  • Q: Is it possible to put default values (from the previous report) into the fields of a new progress report? A: Yes
  • What is the real benefit for the user? How can we motivate them to use the package?
  • The tool helps avoid duplication of effort by making visible what is going on
  • Q: Is it possible to track somebody's time spent on certain projects? A: The problem is being discussed, but this tool does not appear to be the correct context
Thu 09:00 Topic S. Loken: US Atlas Computing: Overview and management
References Slides: HTML, PS, PDF, PPT
Summary
  • Task force active since a year for seeking funds and organising US Atlas Computing; proposal submitted to funding agencies in 11/98; organisation required before funding
  • LBNL selected as lead lab for US Atlas Computing. Project manager: Ian Hinchliffe, Project Engineer: David Malon, Management Support and Leveraged Projects Coordination: Stu Loken, Deputy Project Manager Facilities: Craig Tull, Deputy Project Manager Software: Tom LeCompte. Aim now is to provide a proposal for review by DOE and NSF in May
  • For proposal, areas of expertise and interest of US groups will be identified. Also view on what would make largest impact on Atlas. BaBar professionals to join Atlas as soon as they are released from BaBar
  • Proposal will be for clearly identified software deliverables matched by persons and funds
  • Support: PDSF (mostly PC based now, run by NERSC). Plan to receive 250 TB/y from CERN, stored in HPSS, 50 kSPECint95 for analysis, 50 TB of user disks, 10% system to be available in 2003
  • Software contributions: Concentrate on areas with particular expertise in US, eg. data base. Short term deliverables, training - hence start a pilot project now
  • Leveraged projects: exploit outside projects to test critical aspects of US Atlas computing; funding from non-HEP sources. New proposal for Particle Physics Data Grid (total MICS funding of 17 MUSD)
Thu 09:20 Topic T. LeCompte: Software development for US Atlas Computing
References Slides: HTML, PS, PDF, PPT
Summary
  • Basic goals: Proportional contribution to Atlas software development, development and delivery of computing infrastructure for US Atlas, support US participation in physics specific software ,and overall Atlas computing
  • Regional centre required for coordination, and for centralised resources
  • Software development efforts: pilot project (testbeam analysis), core software (control and/or database domain), system specific reconstruction. Testbeam analysis software is an ideal pilot project to test new ideas and embark many physicist developers. Plans: provide access to testbeam data (from Objy) and Atlas candidate analysis tools, develop G4 simulation for testbeam; efforts primarily targeted at tile cal, later extended to other testbeam efforts
  • Core software development: critical and understaffed right now. US in good position to significantly contribute
  • Detector specific reconstruction: effort from universities; people want to help, but don't know where to start
  • Aim for US-Atlas: 200 FTE for Atlas Software, ie. 33 FTE per year. Assuming 60/40 split between physicists and computer scientists, steep ramp-up as need is there now
Thu 09:30 Topic C. Tull: Regional Centre
References Slides: HTML, gzipped PS, PPT
Web site: arc.nersc.gov
Summary
  • Role of regional centre: access to data, support to users. Facilities to be provided: hardware and data access, software access, service (system operation, user support and training, code management, testing, distribution)
  • NERSC (National Energy Research Scientific Computing): 2000 active users, serving (among others) many large HENP experiments. Profiting from LBL as ESnet site and hub
  • PDSF: cluster running Linux and Solaris dedicated to HENP experiments, run by 2 FTEs. HPSS installation: 10 robots, 60 drives, 70 TB of data, 2.5 TB disk, 97.5% availability. NERSC is HPSS developer site. HPSS imports have successfully been done
  • Support for US Atlas: software repository, reference environment, training, support for integration and coordination, tool support, documentation and tutorials
  • Data analysis phase (2005 and beyond): planning based on 20% rule, 120 analysis users (40 concurrent), 200 TB ESD/y from CERN, 50 kSPECint95 for ESD analysis, 250 TB of tape storage/y, 25 TB of disk storage/y. 8.5 (for operations) + 5 (for US Atlas support) FTEs needed
  • Future plans: funding proposal
Discussion
  • Q: What is the acceptance of C++ in the US? A: There is no problem, CDF, D0, BaBar don't have any Fortran in their reconstruction
  • Q: Why is the plan to provide an ESD rather than an AOD copy? A: Probably we have to consider various levels of regional centres. Not all physics and particularly not all detector studies will be possible with the AOD
  • The detailed planning much depends on the development of wide-area networking. Priority schemes can help much
Thu 10:15 Topic RD Schaffer: 1 TB milestone, event, detector description
References Slides from database meeting:
RD Schaffer: 1 TB milestone HTML, PS, PPT
H. Renshall: 1 TB milestone PS, PDF
C. Arnault: Status report on detector description PS, PDF
M. Schaller: Objectivity benchmarks HTML, PS, PDF, PPT
D. Malon: Plans for Objectivity in Tilecal test beam HTML, PS, PDF, PPT
RD Schaffer: Event model HTML, PS, PDF, PPT
Summary
  • Report from Wednesday's database meeting
  • Detector description: General (and persistent) model with various specific models linked to application domain. Examples of applications: Age files, Objy, common blocks, G4, textual, ... Classes in generic model: DetectorDescriptor, GenericElement, DetectorElement, DetectorPosition. Sample implementations exist for SCT, TRT; lots of interaction with detector communities required
  • 1 TB milestone: basic goal: demonstrate feasibility. 1 TB production data stored into Objy data bases by 1 January 99. A number of performance bottlenecks have been identified and will be worked on. Digits organised following basic event model, digits were copied 10 times. Typical event size: 3 MB, 6% objects of 100 Bytes, 66% of 1 kB, the rest of the objects was larger. Page size used was 8 kB (too small). Several hardware improvements already applied since, however some more understanding of the remaining bottlenecks needed. Plan to redo the 1 TB writing at nominal performance (will take 3 days)
  • Objectivity/DB benchmarks: Access patterns studied: sequential, selected, random for uniformly sized objects. Similar patterns for Solaris and NT
  • Event model: concentrated sofar on access for Geant3 digits, now need to extend to all event objects. Major characteristics: loading of events independently from where data are coming from; organisation and access using identification scheme following logical decomposition of the detector. More people welcome to work on the event model, a couple of important questions to be looked into have been identified
Discussion
  • Q: For the detector description, where would misalignment fit in? A: This has still to be decided, but various possibilities exist
  • Q: To which extent have DAQ and Event Filter been participating to the discussions about the event architecture? A: Some discussions have been held already, but more are to come
Thu 11:00 Topic P. Hendriks: AMBER
References Slides: HTML, gzipped PS, PDF, PPT
Summary
  • Amber stands for Atlas Muon Barrel and Endcap Reconstruction
  • Idea is to try out the more informal walk-through
  • End cap not implemented yet, stand-alone muon system program
  • Input to program: AMDB, Arve simulation or Geant3, Saclay B field map
  • Program: detector hierarchy, GDL (dataflow paradigm), detector reconstruction toolkit, reconstruction
  • Detector Hierarchy: buffer between different input formats and reconstruction algorithms, provides access to digits, uses official muon names - will be replaced with upcoming detector description
  • Detector description toolkit: general classes used by reconstruction, contains tracking in magnetic field, geometrical entities (region of activity, error cone and error point), tracks and vertices, track fits (straight line LSQ)
  • Draft design of track: contents still to be discussed. Based on a Traits parameter which consists of identifier type, module type, quality type, parameter type
  • Track inherits from Trajectory
  • Track parameters: abstract base class, PerigeeParameters and LocalParameters inheriting from it
  • Vertices and track modules: track module is everything that can be used to define a track (hits, vertices, other points, ...)
  • Dataflow architecture: most natural choice, implemented as iterators and iterator adaptors. Data flow from creator (eg. MDT layer) to user (reconstruction). Control flow is the other way around. Basic modules developed to support this
  • Data flow code example shown and explained
  • Trigger reconstruction architecture: Trigger instantiates RPC which is a composite data view
  • Reconstruction of RPC chamber: first hits in identical layers combined, then clusters formed, sorted in phi. Two innermost layers of each type used to form (wide) trigger roads. More clusters used to narrow down road if they fit
  • MDT ladder: consists of chambers in same layer, sector and side. Tube layers in ladder are merged. Hits in layers filtered based on ROA and merged into list. Straight track segments found (same algorithm as in Datcha)
  • Building tracks: track segments from MDT combined, hits are passed to filter, ...
  • Next steps: Read RPC digits from Geant3; variable step size for magnetic field tracking; port to Unix. Subsequent versions: endcaps, material
Discussion
  • Not happy that Track is a Vector
  • Should a general track not contain the parameters plus pointers to the algorithms used to form it?
  • Track class should not be unnecessarily abstract
  • What is the relationship of Track with the requirements? Are there any?
  • OIG or scenario for Track is missing
  • Why has the charge been separated out from LocalTrackParameters?
  • Do all tracks need to have a perigee parameter? No
  • TrackModuleVisitor should not exist without justification
  • Namespaces are not yet permitted - to be revised
  • Are segments created only in one or in two layers? A: The algorithm is flexible, right now it needs multilayers
  • Also, muons which have not triggered must be reconstructed
  • No dead material considered in presented version
  • There is a chance to meet at 14.30 h on Thursday afternoon to discuss further
  • Approach not radically different from what was done so far
  • AMDB reading code outdated, requires extra step to make geometry available on NT
  • How could the package be possibly decomposed into components? A: Trigger, pattern recognition, fit. Probably, smaller units would be inappropriate because of overhead
  • Caution about separation between pattern recognition and fit - the fit may need to access hits which are not part of track segments, and a way to tag hits is required
  • Is there a way to refine the requirements in interactions with the relevant communities? A: Yes, will be discussed in the next muon week
  • In building the MDT ladder, is it assumed that the tubes are perfectly aligned? A: No, misalignment is indeed foreseen
  • What pieces would one envisage to make available outside the muon detector domain? A: the Detector reconstruction toolkit, and the Dataflow package
  • Have you reconstructed any events? A: Yes, ones simulated by Arve, including magnetic field and RPC data, random (no correlated) noise included
  • What was the experience like? Should we do it again? What should we change?
  • Walkthrough should be done in common between detector and software communities
  • As a walkthrough, we weren't prepared; it happened too late; group should be smaller; subject should be much smaller
  • Real meeting does have advantages over e-mail communication
  • Some participants would have liked to contribute more, but felt inhibited by large audience
  • Both detailed walk-throughs and overview presentations are required
  • Possible scenario: Walk-through very early on, later formal review of the code with all formal documents. Additional walk-throughs could be requested
Fri 09:05 Topic S. Fisher: Spider project
References Slides: HTML, PS, PDF, PPT
Summary
  • Objective: define, implement and deploy...
  • Coding standards: based on existing standards, experiments worked together on first document. CodeCheck with selectable rules. New document has been produced, some extra input, important information on source of information dropped. Plan now is to provide single input from all experiments as response to this document
  • SRT requirements: collected from experiments, group formed to sort them out. Glossary produced (requirement-free, 33 items, 5 roles, 11 procedures - was hard to agree on). Work model which covers all scenarios from all expts, expressed in terms of glossary. Differences between experiments mainly due to different preferences. Package independence considered feasible. Next steps: three people to consolidate the requirements, deadline: end April. Then authors and other experts of existing packages can comment which requirements are met, and what would be involved to implement the requirements not yet met
  • New SRT looks more interesting as deficiencies of existing systems become more apparent. What if manpower for implementation will be requested? Still a lot of unhappyness around; hard to get experiments to work together; too much bureaucracy around, but still SRT has a slim chance
Discussion
  • Q: Is building releases at outside regional centres in the requirements? A: It is in the work model
  • LCB party line: will choose one existing product as the base line, IPT will take over maintenance and implementation of important missing functionality
  • Q: What else does Spider care of? A: Nothing for the moment
  • Q: Are the requirements on the Web? A: http://spider.cern.ch/
  • Q: What is the decision making process? How can we make sure that the evaluation is as fair and independent as possible? A: Any decision will require an Atlas-wide (in particular including DAQ) discussion
  • Q: How is the communication flow from Spider supposed to work? It has not always worked well
Fri 09:30 Topic C. Onions: Training
References Slides: HTML, PS, PDF, PPT
Summary
  • Computing review: Training is very important, Chris agreed to look into this. Supposed to not take more than 30% of his working time
  • Basic issues: Who needs training (core software producers, systems software producers, end users); what training is needed (requirements, design, coding, including Atlas specific stuff, tools,...); where should the training be done (home country, CERN, desktop using recorded tutorials, CDs, books, videos)
  • General courses: Cern courses adjacent to Atlas or software weeks. First shot: John Deacon's course condensed scheduled for 14/6/99 to 18/6/99 covering analysis, design and hands-on C++, 16 participants on 8 workstations. Full course ought to be mandatory for system software coordinators, DIG, main software providers; end-users: UML basics, OO C++
  • Steps to be taken: systems appoint coordinators by May 15th; they should follow the full training course. National training contact people to be nominated. Information on training in different countries to be communicated to training coordinator
  • Identify suitable courses, propose missing one, compile list of recommended material, create Web page
  • Demonstration of draft Web page
  • All proposals welcome
Discussion
  • Would be very nice to have the mapping of Zebra banks to objects shown
  • What after the courses? A help desk would be fine, so far Lassi is partly doing the job, but he is heavily overloaded. Also, little projects would be very good
  • Small G4 applications would be excellent for past-training exercises, something similar required on Objectivity and persistency
  • Very important to get professional trainers use examples from our field
  • Q: Should we not make people learn Java in the first place? A: That would be dangerous
Fri 09:50 Topic G. Poulard: Reconstruction meeting
References Slides: HTML, PS, PDF, PPT
Summary
  • Not a good timing due to the physics TDR
  • Muon reconstruction and combined reconstruction: COBRA making progress
  • Statistical combinations (reject muons from K decays): work ongoing
  • Calorimetry: bug in digitisation fixed, some improvements being worked on
  • Development, status and plans: Reminder of milestones. For calorimetry, involvement of developers for new software still not clear
  • Clear plans for ID and muon software, but manpower situation still very unsatisfactory. Unclear whether termination of physics TDR will help
  • Astra looking for maintainer
  • OO reconstruction by end 99? ID seems feasible, calo to be seen by next workshop, Amber no problem, muonbox unclear. Important to keep a full chain running
Discussion
Fri 09:30 Topic K. Sliwa: WWCG meeting
References N/A
Summary
  • Monarc resulted from discussions in Atlas world wide computing group
  • Monarc working groups: simulation and modelling, architecture, analysis, test beds. Check http://www.cern.ch/MONARC
  • Monarc simulation: new Java based tool accessible from the Web, all important elements simulated, easy parametrisation. Disk access modelling to be improved. Some doubts as to how easy it will be to have a realistic network parametrisation. Complete simulation to be expected by end April
  • Collaborative tools: Video conferences, small group meetings, document preparation, model and software development, data management and analysis, experiment and model validation. Electronic notebook created much interest. Video conferencing quite advanced, but maintenance and support difficult because of budgetary constraints
  • Analysis tools: requirements still not documented, but Fermilab Run II requirements could serve as a starting point. Also, requirements for graphics should be looked at. Draft to be written until 26/03/99
Discussion
  • AWWC agenda on the Web was protected
  • Small workshop proposed on analysis tools
Fri 10.55 Topic J. Hrivnac: Graphics meeting
References N/A
Summary
  • Status reports: Muon event display written in Saclay, interactive reconstruction; to be communicated with via XML files. Data available: ID, TruthEvent, Trigger (TRT tracks, hits, silicon space points and tracks). New Atlantis version has been released, can read standard XML files. Same for Wired; installed on AFS, can be tried. Aravis (former Arve graphics, now independent) available, being improved. Orca (integrated Arve histogramming) works on NT, not on Unix yet. OpenScientist (histograms independent from visualisation, persistency etc) looks very promising. Core design: improving. FAQ on Atlas Graphics now available on the Web
  • Design: Lassi proposed new design of Graphics core part, walk-through done. Some way needed to create trees, containers etc. of objects; first design available. Democracy of scenes, self-similarity. Requirements re-organised, more requirements added, now on the graphics Web page
Discussion
  • Q: What can be displayed with the new Atlantis version? A: ID and muon - those parts for which the event structure has been defined and implemented
Fri 11:20 Topic H. Meinhard: Summary
References Slides: HTML, PS, PDF, PPT
Summary
  • (See slides, I'm not going to summarize the summary)
Discussion
  • No offer for August workshop outside CERN, hence workshop will take place at CERN
  • Next outside workshop in 2000, what about the week adjacent to CHEP? It was suggested that a decision about an outside workshop should be taken in December of the preceding year at the latest
  • The production for reconstruction was well, though not centrally, organised
  • In order to make walk-throughs more efficient, they need to be better organised
  • Q: What is the status of Arve? What is the status of the control domain? How far are we with the implementation of the object network? A: Arve is being used for simulation, and being modified to serve their requirements. Clearly progress is not as fast as one would desire


Helge Meinhard / March 1999
Last update: $Id: minutes.html,v 1.12 1999/03/26 16:46:42 helge Exp $