ATLAS Software Workshop Minutes

CERN, March 15 - 19, 1999

Day Time

Mon 09:00 Topic D. Düllmann: Object data bases as data stores for HEP
References Slides: HTML, gzipped PS, PDF, PPT
Summary

Overview over data handling requirements of LHC experiments
Total data volume of LHC experiments: ~ 100 PB
Very heterogeneous, distributed environment for analysis
Existing solutions won't scale. RD45 proposes to use a commercial ODBMS coupled to a mass storage system. Current choices: Objectivity/DB and HPSS. Mass storage system supposed to be transparent to the user
HEP data models are very complex, many relations between quantities, different access patterns depending on phase of data handling (reconstruction, analysis, ...)
Object data base matches best with object oriented production programs
Two options: loose binding between in-memory and on-disk (explicit copying); tight binding (use persistent objects directly in code - no explicit store and retrieve commands). The latter is the more natural choice with OO programs and data bases. The same objects can be accessed from different programming languages (C++, Java, Smalltalk) if common subset of features is used
State of objects preserved from one access to the data base to the next
Navigational access: via a unique Object Identifier (OID), which is a natural extension of the pointer concept. Reflect "physical" object properties in the case of Objectivity. Can be used for uni- or bidirectional associations
Access to objects via 'smart pointers' which look very much like normal pointers, but take care of the necessary I/O automatically
Iteration over all events very easily possible, little code. User does not need to know about physical locations within a federation, this is being taken care of by OIDs
Object clustering: Goal is to transfer only 'useful' data. For example, one could keep all tracking data together on pages. But access patterns might change over time. Caching would be one way of addressing this problem
Physical model: one federation, several data bases with containers. Logical view: one entry point
Physical arrangement can be changed without impact on the applications (except perhaps on performance)
Concurrent access: support for many concurrent writers - multiple readers, one writer per container in the case of Objy/DB. Locks refer to objects, but act on containers
Data changes are part of a transaction - assures integrity of the data base
Objectivity/DB architecture: OID size is 8 bytes, 64 k data bases, 32 k containers per data base, 64 k logical pages per container (4 GB containers for 64 kB page size, 0.5 GB for 8 kB). 64 k object slots per page. Theoretical limit: 10'000 PB, assuming 128 TB files for the data bases. Assuming database files of 100 GB, total capacity would be 6.5 PB. Extensions to Objy/DB have been requested
Scalability tests: 1 TB demonstrated by Atlas, multiple federations of 20...80 GB are used in production. 145 MB/s (80% of disk speed) seen from 240 parallel clients at the Caltech Exemplar machine. Many more scalability tests (overflow conditions etc) went fine, hence federations of a few 100 TB seem feasible right now
Distributed federations: Application, Objectivity page server, lock server, HPSS client, HPSS server can all be on different physical machines
Data replication: at the data base level, objects in replicated DB exists in all replicas, copies are kept in sync by the data base. Provides for enhanced performance and availability. In case of some replicas not being reachable, a quorum mechanism is applied for writing
Schema evolution: Evolve object model over the experiment lifetime. Support comprises change of inheritance hierarchy. Migration of existing objects can be done immediately (whole database at once) or in 'lazy' mode (migration as objects are accessed for writing)
Object versioning: multiple versions of a single logical object, supporting branching and re-merging of branches
Other commercial ODBMS products: Versant (for Unix and Windows, scalable and distributed, schemas kept locally, very difficult economic situation); O2 (for Unix and Windows, incomplete heterogeneity support, bought by Unidata and merged with Vmark); ObjectStore (for Unix and Windows, scalability problems, proprietary compile, kernel extensions, now focussed at Web applications); POET (mainly for Windows, low end, scalability problems). Need to observe what the big object relational vendors will offer
ODBMS: ship data to the client; ORB: ship request to the server, very different performance implications; the two are largely complementary
ODMG standard: standardised interface (data definition language, data interchange format, language bindings for C++, Java, Smalltalk); still, any large scale migration from one ODBMS product to another would require a significant effort
Current projects using Objy/DB: BaBar (to start data taking in May, 200 TB/y), Zeus (for event selection in the analysis phase), AMS, NA45, Chorus, Compass

Discussion

Q: Is there a third way? What if one wants the objects on disk look different from the ones in memory? A: This is more difficult, it basically means introducing another layer in between in-memory as seen by users, and on-disk as seen by the data base. Also note that updating an in-memory object directly gets propagated to the on-disk version
Q: What about changing and enlarging objects? A: That is easily possible within the current application, as the page identifier is a logical one
Q: Can multiple federations be accessed at the same time? A: This can currently only be done sequentially - one federation must have been closed before connecting to another one
Q: What happens if some data (some databases) are not currently accessible? A: Program will stop at this point, because the file server cannot be reached. This can be caught by a signal handler at user level
Q: What if people want to do some event selection on the general store, and take a file with selected data home? A: A private tag data base would be a solution. This would be written within the global federation, and then be imported into the private federation. LHC++ will provide the necessary infrastructure
Q: Why don't we go for our own implementation of an ODBMS? A: The effort would be very significant, but given the unhealthy situation on the commercial market, this option may need to be investigated further

Tue 09:00 Topic J. Knobloch: Introduction, workshop agenda
References Workshop agenda
List of participants
Summary

Highlights and general structure of agenda

Tue 09:10 Topic T. Akesson: Computing review: The Atlas management view
References Schedule of actions: PDF
Summary (kindly provided by the speaker)
T Akesson presented the current view of the ATLAS management. It is in general foreseen to ackomodate the review conclusions but with some variations, and with stronger emphasis on some issues.
The manpower is a central issue and ATLAS need a structure which is prepared to receive efforts as serious committments. This requires work to be assigned in a rather formal way so local support can be seeked with solid justification. Emphasis is therefore put on committments to s/w effort.
The detector specific s/w is suggested to be organised in the detector systems while ATLAS general s/w will be followed by a s/w project leader reporting to the computing coordinator.
The call for detector system s/w coordinators, and dBase, reconstruction and simulation taskleaders has taken place. The call for nominations of the next physics and computing coordinators has also been launched. Contacts have also started to set up an ATLAS wide network of national training contacts.
Next steps will be discussed in the April EB. It will probably include the launch of an architecture taskforce and a quality control group. Results from both will be needed to allow the system and general s/w to be partitioned in pieces that groups can take responsibility for. These groups have to have a balanced composition spanning many activities including both s/w and detector expertise, to ensure an anchored outcome.
Another foreseen ingredient is a National Board divided according to funding agencies, and with a regional center working group. This board will be the main body for issues like platforms, networking, specifying needs for collaborative tools etc. The formation of such a group will require CB endorsement.
The foreseen action plan contain a lot of parallelism to get a re-organisation in place as soon as possible.
Finally it was emphasized that the ATLAS baseline is C++/OO for programming and Objectivity as database. This does not mean that it is decided that ATLAS will have Objectivity as database year 2005, but that the developments and tests that are made now should be in this baseline.
Discussion

Q: Why are we pretending that issues such as quality assurance and architecture have to start from scratch? There have been significant efforts made already. A: Yes, they need to be taken into account, but the review has asked for significant, or even radical, changes
Q: How is the discussion supposed to go on? Is the proposal already casted in stone? A: Comments (to Peter Jenni, Torsten Akesson are always welcome, but the proposal is probably close to the final implementation, and soon, decisions will need to be taken

Tue 10:20 Topic J. Knobloch: Computing review: Report from ACOS
References Slides: HTML, gzipped PS, PDF, PPT
Summary

ACOS met on 19 February; main topic: review report
Reports from domains: ID, LAr, Tiles, Muon, Trigger, Database, Detector description, Calibration and alignment, Analysis tools
Computing review: Endorses general strategy, suggests improvements on organisation, software process, and regional centres
P. Jenni and T. Akesson: Atlas computing is not on the right track, important changes needed, action plan for EB on 09/04/99. Much more involvement of system experts needed
Organisational structure: Discontinue DIG and ACOS, establish a Computing Steering Group instead, taking most of the DIG tasks, consisting of Computing, Physics, Database, Simulation, Reconstruction, Event filter coordinators, the management and three more members
Round table on organisation: Training is important; another meeting with formal links from countries; formal commitment of institutes required; assemble tasks in work packages; address also the maintenance of the Fortran software; core group is important; schism overemphasised in report, mostly psychological, prolonged by physics TDR; should be overcome from both sides; ID does not see this schism
Organigram: Too many links; interfaces ill defined; needs explanation in a separate document; proposed organisation could even widen the gap; need two bodies, one for policy, one for technical issues; precise mandate needs to be defined for the physics and the computing coordinator
ASP: Astonishment about proposal to dismiss software process; quality control is important; rules are not questioned, but are not sufficient; work going on to make ASP more gentle; design and code reviews could be part of the learning process; process needs to be communicated better; have to reduce the threshold; StP considered too difficult; documentation to be simplified
Regional centres and Monarc: Avoid duplication of work between Monarc and ATLAS

Discussion

In some cases in the past, split between reconstruction and simulation at high level has been beneficial
A member of the review committee remarked that the statements concerning the endorsal of the main strategic choices in the committee report are, in his opinion, not correct.
Main shortcoming so far was that from the management, there were incoherent signals sent to the community. Also in terms of the system architecture, strong leadership either by one chief architect, or by a group of competent people, is required
A high-level split between simulation and reconstruction can work provided the interfaces are properly defined
Of the deliverables foreseen in the ASP, the requirements collection has worked least well
For detector description, some necessary communication with the systems has not taken place yet
For the nominations of the physics and computing coordinators, care will be taken on the ability of the two to collaborate
The management proposal is not yet in a state where it can be widely published
The functions and the responsabilities of the physics and the computing coordinator need to be clearly defined
The reasons leading to the choices of the current organisation must be carefully considered as well
T. Akesson: All effort should be made to ensure that all Atlas is pulling in the same direction

Tue 11:05 Topic J. Allison: Geant4 software process
References Slides: HTML
Paper for CERN School of Computing 1997: PS
Summary

Geant4 started as RD44 in 1995. Easy to write a requirements document
Aim is to provide code which (together with the documentation) is readable by end-users
Now migrating from research and development project to a production phase, organisation similar to a large experiment
Next production release in May, mostly consolidation, not many new features
Requirements, design, implementation, evaluation has been an iterative process
Design tool: Rational Rose, not used for coding, drawings depict high-level design
Language choice has had some impact on the design. Minimum of coding guidelines used (all classes start with G4, methods start with a capital letter, ....)
Important to design categories with loose coupling and well defined interfaces. No circular dependencies in the category diagram, leading to very small number of initial circular code dependencies
Used abstract interfaces, avoided 'casting'
Category coordinators given much independence
Code review considered essential, requirements were presented to the community, designs first reviewed within category, then brought to global G4 workshops, without any formal procedure
Cvs considered very important, but no software release tools used
Data kept in static arrays, cvs maintained files, or external (tar) files
Lessons learned: Be prepared for iterative re-design; modularise, couple loosely; plan for environments; Early test procedures, bug tracking, code review procedures; versioning of binaries; exception handling; limit size of executables

Discussion

Q: Did the code review point out problems of coding or rather problems of design? A: Both, with some emphasis on coding
Q: What is the experience with the different software processes in G4 and in BaBar by people working on both? A: There is communication about the software process. Geant4 has not been integrated into the BaBar software, but is considered external
Q: Were there problems with missed milestones? A: Yes, in that case the targets have been rescheduled. In no case has Fortran software been wrapped
Q: Would you still choose C++ if the choice had to be taken today? A: Not sure, but clearly C++ is still perfectly adequate. In particular, the excellent tool support needs to be considered
Q: Did it happen that responsibility for a major piece had to be transferred to another person? A: Yes, frequently, and it went quite smoothly. This benefitted from the good design documentation and the readable code
Q: Did you have the case of code which was that bad that it needed to be rejected? A: Yes

Tue 11:35 Topic B. Jones: Back-end DAQ software process
References Software Process in the Atlas Back-end DAQ (Oct 97): PS, PDF
Back-end DAQ software process (Feb 99): PS, PDF
Summary

Only refers to a specific subsystem of DAQ
Goal was to provide a prototype "-1" of the DAQ; subsystems: detector interface, data flow, event filter, back-end software
Strong interfacing with all other parts of the DAQ
Software process: put in place by concerned developers, adapted from textbooks with timescales, applicability, scope, experience of developers in mind
Many similarities with the ASP, even increasing with time; components roughly map to domains
Differences: Smaller, much better integrated community; software process described in a couple of "how-to" Web pages
Inspections are more human, more detail on testing procedures; so far, all back-end software has followed the process, which has become a little more formal with time
Continuous communication with ASP authors
Phases: requirements collection; document was reviewed and used as a guideline to decompose the project into components
Pre-design investigation phase: decision on the implementation language etc: C++; written down in technical notes, reviewed in back-end DAQ meetings
High-level design: by far not as detailed as what is foreseen in the ASP, mostly textual, few diagrams picked from StP. There is no single complete module in StP. High-level design done by 5 small groups. Reviewed in meetings, quite some designs went through a few iterations
Detailed design and implementation: many deliverables such as code, implementation note, users' guide, test plan (originally foreseen at high-level design, but didn't work there)
Testing and integration
Work organised much around components, max. 4 developers per component, one coordinator per component. Prefer one institute per component. Additional efforts spent on identifying commonalities between components
Components developed according to agreed priority
Important external packages
Inspections: Introduced only during the process, managed by one identified, trained person, based on Tom Gilb's software inspection method. Different check-lists depending on the deliverable being reviewed. Focus is on identifying problems, not solving them. "Real" meetings are much preferred over "virtual" ones. Excellent way of training newcomers. Best to have people start as reviewers before they act as authors. But: Inspections are a lot of work! Code inspections require full documentation, compliance with coding rules, integration into SRT, positive outcome of testing tools. Important faults found this way, hence it is worth the effort
Division into phases helped organise and schedule the work; organisation chosen to be as small as possible; adoption of OMT was very important for communication and documentation
Now 11 components, 180 k LOC, partly ready, concentrating now on regular incremental releases
"Do's": start gently; keep it simple; inspect all deliverables; provide templates, checklists, examples; get a non-author for component testing. "Don'ts": burden developers unnecessarily; ask developers to do something which has not been tried before; underestimate time and effort required for software management, integration and testing; do distributed development if avoidable

Discussion

Q: Has the metrics in Logiscope been used? A: Yes, was found useful. Strong support from IT/IPT and Lassi Tuura
We should try and learn more about the DAQ testing
Q: Were there problems with developers writing code with dirty C tricks? A: Yes, it has been dealt with on a case-by-case basis. In some cases, it was well justified because of hardware interfaces
Q: What about inspections which come up a second time? A: Did not yet happen during the more formal procedures. Informally, it happened, and it was considered one and the same inspection process
Q: How much are the developers available? A: Only 1 person full-time, all others part-time, but if it's not 50% at least, it is not worth it
Decomposition is difficult if the technology is still not clear to all persons involved
Q: How important is it that the managers are themselves contributing to the development? A: That depends on the size of the project, but some feedback is very important. On the other hand, managerial issues must not be underestimated
Q: Has any code been thrown away? A: Yes, all the evaluation software. Other parts have been changed very substantially

Wed 09:00 Topic H. Meinhard: Report from DIG meeting
References Slides: HTML, PS, PDF, PPT
Summary

Computing review: short discussion; surprised to learn about implementation steps without consultation with the people affected
Reviews: Wired, XML parser: partial positive feedback, awaiting more evaluation reports. Graphics code, muon code: awaiting more reviewers' reports; many of the comments received so far are on design, not on code. SRT documentation: being completed. Magnetic field design (rather tracking in magnetic field): New design report wrill be written because of too numerous substantial comments. IPatRec design: completed; review resulted in smaller and better separated modules. Walkthrough of muon code on Thursday this week
Global architecture document: DIG working group to write up all our architectural design choices made so far. Will start with scenarios and the document about the 'how' (what technology will we use to implement the system), and then (with strong involvement of systems and physics groups) the 'what' (what simulation, reconstruction, physics... are we going to do). Document should be understandable by all developers. Detailed outline of 'how' document being prepared now
Round of domains: Graphics, data base, reconstruction: see reports later this week. Muons: concentrating on physics TDR, Muonbox in cvs. Magnetic field: ported to more platforms. Analysis tools: discussion of requirements in AWWC meeting. TileCal: setting up for test beam: Objy, analysis tools; meeting later this week. ID: Common clustering progressing. Reconstruction: Track class being prepared, first draft next week, discussions with the detector communities foreseen
Tools: Bonsai (sophisticated Web cvs browser) working fine, need to revise requirements for Light; UML conversion of StP systems ongoing, new systems should use UML; review of design tools to be pushed, likely that more than one tool will be recommended
Platforms: Only few requests for Windows NT, no offer to help; hence, recommend to suspend support
AOB: Dig recommends to hold May workshop at CERN, decide August workshop still this week

Discussion

Architecture must define the interfaces between domains as well
Not clear that people prefer an outside workshop; anyway, a final decision about the August workshop must be taken by the end of this week

Wed 09:30 Topic G. Poulard: Status of TDR software and productions
References Slides: HTML, PS, PDF, PPT
Summary

Geometry description: ID and calos stable, new tile digitisation implemented; muon: recently changed, still based on CMZ version
Reconstruction: iPatRec: much work for combined reconstruction; muons: Muonbox being actively developed
Combined reconstruction: e/gamma, conversions available; muon identification: combination of muonbox and iPatRec (good at high pt, but poor efficiency at low pt); new statistical approach combining covariance matrices results in improved low pt efficiency
Other work (private basis): e/gamma id, conversions, soft e, muon id, soft muons, tilecall cells, primary vertex, vertex b-tag, overall b-tag
Combined n-tuple provided in atrecon
All code in cvs, dice not fully tested yet, hence is not yet an official release. Main programs now in 'Applications' domain. Not yet on all platforms
Production: For simulation, see Web page; reconstruction not centrally organised, mostly done on private basis, list is probably incomplete
Next: evolution of the geometry (is it needed, time scale, reference version for G4); evolution of the reconstruction, comparison in 2000, analysis in new framework of produced data; maintenance of Fortran software is a major enterprise, requiring significant resources

Discussion

Care must be taken to always have a running system, in particular because of the test beam activities
Q: Have numbers been obtained already for the CPU requirements of the reconstruction so far? A: No final numbers yet, but they will be provided with the physics TDR
Q: Which version of AIX has been used? A: AIX 4.1. Version 4.3 gives problems in the linking step
Q: What is the status of Linux? A: The simulation works fine, there are minor problems with the reconstruction which are hoped to be solved within the next few weeks

Wed 09:50 Topic M. Stavrianakou: Repository and releases
References Slides: HTML, PS, PDF, PPT
Operating systems and compilers for Cernlib 98, Cernlib 99, LHC++ 98a, LHC++ 99a
Summary

TDR software: 25 top level packages, domain software: 16 top level packages. 480 kLOC in F77, 98 kLOC in Age, 269 kLOC in C++ (30% increase over December 98). In release: 40 top level packages
Platforms: HP, DEC, IBM, Linux, Sun; not (fully) supported: SGI, WNT
Fortnightly developer releases; aim at weekly releases?
Nightly builds: from the head versions of the packages. Feedback to developers still to be improved
Release frequency and procedure: simplification and speed up needed: pragmatic decisions on supported platforms; improve package structures and dependencies; use nightly builds for early debugging; ease developer's job; release some packages independently or using binaries from previous releases; local disks or dedicated machines; building on fastest platforms first; work incrementally at preparation stage; share partial or full support for some platforms with other institutes
Tools and QA: cvsweb, Bonsai available; dependency grapher; CodeCheck, Insure++, Purify, Logiscope. Clear policy needed on who runs what when
Spider SRT, Collaboratory tools

Discussion

The list of supported compilers ought to be on the Web
Q: Are we in sync with Cernlib and LHC++? A: Yes, after the next LHC++ release
Q: Is it possible to have both optimised and debug releases? A: That would mean a lot of work and additional resource consumption

Wed 10:40 Topic Discussion about Windows NT support
Summary

There is not much demand for NT support. SRT etc. work, but NT would require commitment of many developers
The last questionnaire resulted in 5 institutes asking for NT support, and no one offering help (even after personal contacts)
One aspect to consider is the different graphics interface of NT
Obviously, NT support would take valuable resources; our prime goal should be to have software developed
NT is widely used in industry, we shouldn't write it off too early
NT support must be understood to comprise support for native tools such as Visual Studio, which means more work than for an additional Unix platform
The Event Filter requirements need to be considered. However, the EF is not committed to NT either. They don't request NT versions of the reconstruction code
When discussing NT support, extreme positions (such as one guy cleans up the mess of all developers, or all developers must make sure their code runs under NT) are not helpful. Some reasonable compromise would need to be found
Important that we do not preclude now a later move to NT. This implies that the developers should be given guidelines what to do and what to avoid to remain compatible with NT. Of course, this would not help for graphical applications
Visual Studio is a nice development environment that many people like. An option would be to support it without building releases regularly on NT
Events external to Atlas (eg. Spider/SRT providing good NT support) could trigger us reconsider the situation
NT has proven to be excellent for finding bugs in code; are we prepared to give this up?
As long as we access Fortran code from C++, there is a problem on NT with Character arguments
Part of the lousy reputation of NT may be due to the NICE NT installation at CERN; why not set up our own NT cluster in building 40?
NT is popular due to - among other things - Visual Studio; however, similar tools exist under Unix and should be evaluated
There is now an emulator of PC hardware available which allows to run NT (and 98) under Linux on a X86 PC

Decision

For the time being, no releases will be provided on NT, and the project files necessary for Visual Studio will not be provided
Developers are not required to ensure their packages run on NT
A set of guidelines will be developed in order to allow for a later move to NT
The decision will be revised if a volunteer offers to take care of NT support

Wed 11:30 Topic J. Knobloch: End 99 computing status report
References Slides: HTML, PS, PDF, PPT
Summary

Requested by LHCC referees
Scope not yet defined
Subject to consequences from computing review
Major topics: Project management (organisation, manpower, management tools, revised software process, milestones, critical items, risk analysis); software (architecture, framework, data base, reconstruction, simulation, graphics, analysis tools); computing model (technology tracking, Monarc, analysis strategy, regional centres, central installation, cost update); remote communication and collaboration; software development environment; training

Discussion

Work could be simplified by asking the referees what they really want to be discussed
It is planned to split the document into pieces, and to assign various editors to them

Wed 11:40 Topic J. de Jonghe, M. Angberg: Project management and supporting tools
References Slides: HTML, gzipped PS, PDF, PPT
Summary

Originally developed for hardware projects, not too different from software execution plan proposed two years ago
Advantages: automatic request for reporting; uniform reports available in central location
Input required: work packages (not trivial to define)
Concepts: work package, progress report, comment to progress report
Workflow: Manager gets alerted until he produced a report, of which the project leader gets informed and can comment
Integration with MS Excel, MS Project, being improved. Excel and Project not required, though
Demonstration of Web interface: Various views on progress reports and work packages with powerful navigation tools. Easy modification and creation of work packages
Work packages assigned to PBS
Next version: graphical display of cost profiles, GANTT charts
Various ways of customisation for software work feasible

Discussion

Q: Is it possible to put default values (from the previous report) into the fields of a new progress report? A: Yes
What is the real benefit for the user? How can we motivate them to use the package?
The tool helps avoid duplication of effort by making visible what is going on
Q: Is it possible to track somebody's time spent on certain projects? A: The problem is being discussed, but this tool does not appear to be the correct context

Thu 09:00 Topic S. Loken: US Atlas Computing: Overview and management
References Slides: HTML, PS, PDF, PPT
Summary

Task force active since a year for seeking funds and organising US Atlas Computing; proposal submitted to funding agencies in 11/98; organisation required before funding
LBNL selected as lead lab for US Atlas Computing. Project manager: Ian Hinchliffe, Project Engineer: David Malon, Management Support and Leveraged Projects Coordination: Stu Loken, Deputy Project Manager Facilities: Craig Tull, Deputy Project Manager Software: Tom LeCompte. Aim now is to provide a proposal for review by DOE and NSF in May
For proposal, areas of expertise and interest of US groups will be identified. Also view on what would make largest impact on Atlas. BaBar professionals to join Atlas as soon as they are released from BaBar
Proposal will be for clearly identified software deliverables matched by persons and funds
Support: PDSF (mostly PC based now, run by NERSC). Plan to receive 250 TB/y from CERN, stored in HPSS, 50 kSPECint95 for analysis, 50 TB of user disks, 10% system to be available in 2003
Software contributions: Concentrate on areas with particular expertise in US, eg. data base. Short term deliverables, training - hence start a pilot project now
Leveraged projects: exploit outside projects to test critical aspects of US Atlas computing; funding from non-HEP sources. New proposal for Particle Physics Data Grid (total MICS funding of 17 MUSD)

Thu 09:20 Topic T. LeCompte: Software development for US Atlas Computing
References Slides: HTML, PS, PDF, PPT
Summary

Basic goals: Proportional contribution to Atlas software development, development and delivery of computing infrastructure for US Atlas, support US participation in physics specific software ,and overall Atlas computing
Regional centre required for coordination, and for centralised resources
Software development efforts: pilot project (testbeam analysis), core software (control and/or database domain), system specific reconstruction. Testbeam analysis software is an ideal pilot project to test new ideas and embark many physicist developers. Plans: provide access to testbeam data (from Objy) and Atlas candidate analysis tools, develop G4 simulation for testbeam; efforts primarily targeted at tile cal, later extended to other testbeam efforts
Core software development: critical and understaffed right now. US in good position to significantly contribute
Detector specific reconstruction: effort from universities; people want to help, but don't know where to start
Aim for US-Atlas: 200 FTE for Atlas Software, ie. 33 FTE per year. Assuming 60/40 split between physicists and computer scientists, steep ramp-up as need is there now

Thu 09:30 Topic C. Tull: Regional Centre
References Slides: HTML, gzipped PS, PPT
Web site: arc.nersc.gov
Summary

Role of regional centre: access to data, support to users. Facilities to be provided: hardware and data access, software access, service (system operation, user support and training, code management, testing, distribution)
NERSC (National Energy Research Scientific Computing): 2000 active users, serving (among others) many large HENP experiments. Profiting from LBL as ESnet site and hub
PDSF: cluster running Linux and Solaris dedicated to HENP experiments, run by 2 FTEs. HPSS installation: 10 robots, 60 drives, 70 TB of data, 2.5 TB disk, 97.5% availability. NERSC is HPSS developer site. HPSS imports have successfully been done
Support for US Atlas: software repository, reference environment, training, support for integration and coordination, tool support, documentation and tutorials
Data analysis phase (2005 and beyond): planning based on 20% rule, 120 analysis users (40 concurrent), 200 TB ESD/y from CERN, 50 kSPECint95 for ESD analysis, 250 TB of tape storage/y, 25 TB of disk storage/y. 8.5 (for operations) + 5 (for US Atlas support) FTEs needed
Future plans: funding proposal

Discussion

Q: What is the acceptance of C++ in the US? A: There is no problem, CDF, D0, BaBar don't have any Fortran in their reconstruction
Q: Why is the plan to provide an ESD rather than an AOD copy? A: Probably we have to consider various levels of regional centres. Not all physics and particularly not all detector studies will be possible with the AOD
The detailed planning much depends on the development of wide-area networking. Priority schemes can help much

Thu 10:15 Topic RD Schaffer: 1 TB milestone, event, detector description
References Slides from database meeting:

RD Schaffer: 1 TB milestone HTML, PS, PPT
H. Renshall: 1 TB milestone PS, PDF
C. Arnault: Status report on detector description PS, PDF
M. Schaller: Objectivity benchmarks HTML, PS, PDF, PPT
D. Malon: Plans for Objectivity in Tilecal test beam HTML, PS, PDF, PPT
RD Schaffer: Event model HTML, PS, PDF, PPT

Summary

Report from Wednesday's database meeting
Detector description: General (and persistent) model with various specific models linked to application domain. Examples of applications: Age files, Objy, common blocks, G4, textual, ... Classes in generic model: DetectorDescriptor, GenericElement, DetectorElement, DetectorPosition. Sample implementations exist for SCT, TRT; lots of interaction with detector communities required
1 TB milestone: basic goal: demonstrate feasibility. 1 TB production data stored into Objy data bases by 1 January 99. A number of performance bottlenecks have been identified and will be worked on. Digits organised following basic event model, digits were copied 10 times. Typical event size: 3 MB, 6% objects of 100 Bytes, 66% of 1 kB, the rest of the objects was larger. Page size used was 8 kB (too small). Several hardware improvements already applied since, however some more understanding of the remaining bottlenecks needed. Plan to redo the 1 TB writing at nominal performance (will take 3 days)
Objectivity/DB benchmarks: Access patterns studied: sequential, selected, random for uniformly sized objects. Similar patterns for Solaris and NT
Event model: concentrated sofar on access for Geant3 digits, now need to extend to all event objects. Major characteristics: loading of events independently from where data are coming from; organisation and access using identification scheme following logical decomposition of the detector. More people welcome to work on the event model, a couple of important questions to be looked into have been identified

Discussion

Q: For the detector description, where would misalignment fit in? A: This has still to be decided, but various possibilities exist
Q: To which extent have DAQ and Event Filter been participating to the discussions about the event architecture? A: Some discussions have been held already, but more are to come

Thu 11:00 Topic P. Hendriks: AMBER
References Slides: HTML, gzipped PS, PDF, PPT
Summary

Amber stands for Atlas Muon Barrel and Endcap Reconstruction
Idea is to try out the more informal walk-through
End cap not implemented yet, stand-alone muon system program
Input to program: AMDB, Arve simulation or Geant3, Saclay B field map
Program: detector hierarchy, GDL (dataflow paradigm), detector reconstruction toolkit, reconstruction
Detector Hierarchy: buffer between different input formats and reconstruction algorithms, provides access to digits, uses official muon names - will be replaced with upcoming detector description
Detector description toolkit: general classes used by reconstruction, contains tracking in magnetic field, geometrical entities (region of activity, error cone and error point), tracks and vertices, track fits (straight line LSQ)
Draft design of track: contents still to be discussed. Based on a Traits parameter which consists of identifier type, module type, quality type, parameter type
Track inherits from Trajectory
Track parameters: abstract base class, PerigeeParameters and LocalParameters inheriting from it
Vertices and track modules: track module is everything that can be used to define a track (hits, vertices, other points, ...)
Dataflow architecture: most natural choice, implemented as iterators and iterator adaptors. Data flow from creator (eg. MDT layer) to user (reconstruction). Control flow is the other way around. Basic modules developed to support this
Data flow code example shown and explained
Trigger reconstruction architecture: Trigger instantiates RPC which is a composite data view
Reconstruction of RPC chamber: first hits in identical layers combined, then clusters formed, sorted in phi. Two innermost layers of each type used to form (wide) trigger roads. More clusters used to narrow down road if they fit
MDT ladder: consists of chambers in same layer, sector and side. Tube layers in ladder are merged. Hits in layers filtered based on ROA and merged into list. Straight track segments found (same algorithm as in Datcha)
Building tracks: track segments from MDT combined, hits are passed to filter, ...
Next steps: Read RPC digits from Geant3; variable step size for magnetic field tracking; port to Unix. Subsequent versions: endcaps, material

Discussion

Not happy that Track is a Vector
Should a general track not contain the parameters plus pointers to the algorithms used to form it?
Track class should not be unnecessarily abstract
What is the relationship of Track with the requirements? Are there any?
OIG or scenario for Track is missing
Why has the charge been separated out from LocalTrackParameters?
Do all tracks need to have a perigee parameter? No
TrackModuleVisitor should not exist without justification
Namespaces are not yet permitted - to be revised
Are segments created only in one or in two layers? A: The algorithm is flexible, right now it needs multilayers
Also, muons which have not triggered must be reconstructed
No dead material considered in presented version
There is a chance to meet at 14.30 h on Thursday afternoon to discuss further
Approach not radically different from what was done so far
AMDB reading code outdated, requires extra step to make geometry available on NT
How could the package be possibly decomposed into components? A: Trigger, pattern recognition, fit. Probably, smaller units would be inappropriate because of overhead
Caution about separation between pattern recognition and fit - the fit may need to access hits which are not part of track segments, and a way to tag hits is required
Is there a way to refine the requirements in interactions with the relevant communities? A: Yes, will be discussed in the next muon week
In building the MDT ladder, is it assumed that the tubes are perfectly aligned? A: No, misalignment is indeed foreseen
What pieces would one envisage to make available outside the muon detector domain? A: the Detector reconstruction toolkit, and the Dataflow package
Have you reconstructed any events? A: Yes, ones simulated by Arve, including magnetic field and RPC data, random (no correlated) noise included
What was the experience like? Should we do it again? What should we change?
Walkthrough should be done in common between detector and software communities
As a walkthrough, we weren't prepared; it happened too late; group should be smaller; subject should be much smaller
Real meeting does have advantages over e-mail communication
Some participants would have liked to contribute more, but felt inhibited by large audience
Both detailed walk-throughs and overview presentations are required
Possible scenario: Walk-through very early on, later formal review of the code with all formal documents. Additional walk-throughs could be requested

Fri 09:05 Topic S. Fisher: Spider project
References Slides: HTML, PS, PDF, PPT
Summary

Objective: define, implement and deploy...
Coding standards: based on existing standards, experiments worked together on first document. CodeCheck with selectable rules. New document has been produced, some extra input, important information on source of information dropped. Plan now is to provide single input from all experiments as response to this document
SRT requirements: collected from experiments, group formed to sort them out. Glossary produced (requirement-free, 33 items, 5 roles, 11 procedures - was hard to agree on). Work model which covers all scenarios from all expts, expressed in terms of glossary. Differences between experiments mainly due to different preferences. Package independence considered feasible. Next steps: three people to consolidate the requirements, deadline: end April. Then authors and other experts of existing packages can comment which requirements are met, and what would be involved to implement the requirements not yet met
New SRT looks more interesting as deficiencies of existing systems become more apparent. What if manpower for implementation will be requested? Still a lot of unhappyness around; hard to get experiments to work together; too much bureaucracy around, but still SRT has a slim chance

Discussion

Q: Is building releases at outside regional centres in the requirements? A: It is in the work model
LCB party line: will choose one existing product as the base line, IPT will take over maintenance and implementation of important missing functionality
Q: What else does Spider care of? A: Nothing for the moment
Q: Are the requirements on the Web? A: http://spider.cern.ch/
Q: What is the decision making process? How can we make sure that the evaluation is as fair and independent as possible? A: Any decision will require an Atlas-wide (in particular including DAQ) discussion
Q: How is the communication flow from Spider supposed to work? It has not always worked well

Fri 09:30 Topic C. Onions: Training
References Slides: HTML, PS, PDF, PPT
Summary

Computing review: Training is very important, Chris agreed to look into this. Supposed to not take more than 30% of his working time
Basic issues: Who needs training (core software producers, systems software producers, end users); what training is needed (requirements, design, coding, including Atlas specific stuff, tools,...); where should the training be done (home country, CERN, desktop using recorded tutorials, CDs, books, videos)
General courses: Cern courses adjacent to Atlas or software weeks. First shot: John Deacon's course condensed scheduled for 14/6/99 to 18/6/99 covering analysis, design and hands-on C++, 16 participants on 8 workstations. Full course ought to be mandatory for system software coordinators, DIG, main software providers; end-users: UML basics, OO C++
Steps to be taken: systems appoint coordinators by May 15th; they should follow the full training course. National training contact people to be nominated. Information on training in different countries to be communicated to training coordinator
Identify suitable courses, propose missing one, compile list of recommended material, create Web page
Demonstration of draft Web page
All proposals welcome

Discussion

Would be very nice to have the mapping of Zebra banks to objects shown
What after the courses? A help desk would be fine, so far Lassi is partly doing the job, but he is heavily overloaded. Also, little projects would be very good
Small G4 applications would be excellent for past-training exercises, something similar required on Objectivity and persistency
Very important to get professional trainers use examples from our field
Q: Should we not make people learn Java in the first place? A: That would be dangerous

Fri 09:50 Topic G. Poulard: Reconstruction meeting
References Slides: HTML, PS, PDF, PPT
Summary

Not a good timing due to the physics TDR
Muon reconstruction and combined reconstruction: COBRA making progress
Statistical combinations (reject muons from K decays): work ongoing
Calorimetry: bug in digitisation fixed, some improvements being worked on
Development, status and plans: Reminder of milestones. For calorimetry, involvement of developers for new software still not clear
Clear plans for ID and muon software, but manpower situation still very unsatisfactory. Unclear whether termination of physics TDR will help
Astra looking for maintainer
OO reconstruction by end 99? ID seems feasible, calo to be seen by next workshop, Amber no problem, muonbox unclear. Important to keep a full chain running

Discussion

Fri 09:30 Topic K. Sliwa: WWCG meeting
References N/A
Summary

Monarc resulted from discussions in Atlas world wide computing group
Monarc working groups: simulation and modelling, architecture, analysis, test beds. Check http://www.cern.ch/MONARC
Monarc simulation: new Java based tool accessible from the Web, all important elements simulated, easy parametrisation. Disk access modelling to be improved. Some doubts as to how easy it will be to have a realistic network parametrisation. Complete simulation to be expected by end April
Collaborative tools: Video conferences, small group meetings, document preparation, model and software development, data management and analysis, experiment and model validation. Electronic notebook created much interest. Video conferencing quite advanced, but maintenance and support difficult because of budgetary constraints
Analysis tools: requirements still not documented, but Fermilab Run II requirements could serve as a starting point. Also, requirements for graphics should be looked at. Draft to be written until 26/03/99

Discussion

AWWC agenda on the Web was protected
Small workshop proposed on analysis tools

Fri 10.55 Topic J. Hrivnac: Graphics meeting
References N/A
Summary

Status reports: Muon event display written in Saclay, interactive reconstruction; to be communicated with via XML files. Data available: ID, TruthEvent, Trigger (TRT tracks, hits, silicon space points and tracks). New Atlantis version has been released, can read standard XML files. Same for Wired; installed on AFS, can be tried. Aravis (former Arve graphics, now independent) available, being improved. Orca (integrated Arve histogramming) works on NT, not on Unix yet. OpenScientist (histograms independent from visualisation, persistency etc) looks very promising. Core design: improving. FAQ on Atlas Graphics now available on the Web
Design: Lassi proposed new design of Graphics core part, walk-through done. Some way needed to create trees, containers etc. of objects; first design available. Democracy of scenes, self-similarity. Requirements re-organised, more requirements added, now on the graphics Web page

Discussion

Q: What can be displayed with the new Atlantis version? A: ID and muon - those parts for which the event structure has been defined and implemented

Fri 11:20 Topic H. Meinhard: Summary
References Slides: HTML, PS, PDF, PPT
Summary

(See slides, I'm not going to summarize the summary)

Discussion

No offer for August workshop outside CERN, hence workshop will take place at CERN
Next outside workshop in 2000, what about the week adjacent to CHEP? It was suggested that a decision about an outside workshop should be taken in December of the preceding year at the latest
The production for reconstruction was well, though not centrally, organised
In order to make walk-throughs more efficient, they need to be better organised
Q: What is the status of Arve? What is the status of the control domain? How far are we with the implementation of the object network? A: Arve is being used for simulation, and being modified to serve their requirements. Clearly progress is not as fast as one would desire

Helge Meinhard / March 1999
Last update: $Id: minutes.html,v 1.12 1999/03/26 16:46:42 helge Exp $