Milano, july 1998;

MONARC note n. 1/98 - version 1.1

The analysis model and the optimization of geografical distribution of computing resources:

a strong connection

M. Campanella, L. Perini

INFN and University of Milano

ABSTRACT

This document contains a preliminary investigation of possible computing analysys models for ATLAS (and LHC) experiments. Evaluation of CPU, storage and I/O requirements is sketched in a top-down approach, starting from the CTP model. As a result, some models show themselves as not realistic and items are underlined where more data and investigation is needed to proceed.

1. Introduction.

The work documented here has been done as a first preliminary step for the common project with other LHC experiments about distributed computing and Regional Centers. Starting from the analysis model sketched in the CTP, the purpose was to better define the input parameters and their range of variability for a detailed simulation.

In the process of quantifying the range of possible analysis models based on the CTP ideas, we realized that different, yet allowed and reasonable models, could lead to very different conclusions in respect to the convenience and efficiency of decentralized models (Regional Centers and local "ntuple" analysis in the institutes) and the analyses system as a whole. We realized also the need for a detailed simulation and the central role of the scheduling requirements.

While these facts are not really new or unexpected, they are relevant in the definition of a project about Computing Model, as they underline the point of a joint tuning for the analysis model and the level of geographical distribution.

In the first section we list our assumptions together with a brief description of terminology. The second one is devoted to different analysis models. The third draws some conclusion and tracks the main directions to follow.

The results reported are meant only to be a starting point for broader discussion and further work.

2. Overall assumptions

Most of the input derives from the CTP model; we assumed a simple and naive scheduling model, based on the requirement that the relevant analysis phases be completed within some fixed amount of time. This is a natural consequence for our "back-of-envelop" computational approach and, while it creates an unrealistic amounts of idle time and clearly signals the need for a real simulation, it illustrates well the strong link between analysis model and optimization of the geographical distribution.

The same sketchy approach to storage and I/O requirements is enough to pinpoint various interesting areas and weaknesses in some models.

Note that no assumption is made in the following about the (distributed) object oriented database or its performance; in some way, it is assumed to behave with no overhead. This is a non-trivial assumption and, as we will see, the behavior of the database is of outmost importance in evaluating the whole system.

Event rate = 100 Hz, raw event size = 1 MB, 1 'run' year = 10^7 seconds
Format sizes: ESD=100kB, AOD=10kB, "ntuple"=1kB.
1 day=10^5 seconds (15% overestimate...)
CPU of high end machine 100 SpecInt95*sec
Reconstruction passes are performed at CERN reducing event size by a factor of ten. The CPU for reconstruction is evaluated 70 kSpecInt95*sec
20 analysis groups of 25 physicists each = 500 analyzers
30% of the above are simultaneously active = 150 active analyzers The 30% factor can be obtained by supposing that in any given site an analyzer keeps the system busy with jobs for 40% of the time. However 25% of the analyzers are far apart (US and Japan) so that their busy times occur in completely different periods of the day respect to the others. Thus 40% is reduced by 10% (=25% x 40%) to 30%.

RC model assumed is the simplest, where each regional community uses only its own center. We assume 8 centers, on each of them 5 analyses are performed, which means that every analysis is undertaken in 2 centers. Thus there are 62.5 analyzers/center, 40% active = 25; the number of physicists/analysis is 12.5 in every RC and active/analysis are 5.

In the following a uniform distribution of busy times is assumed; however it is clear that such an assumption is not very realistic.

Last, but not least, this investigation examines only the steady state of the analysis process. The period of time elapsing between the first event and the start of the steady state is not (yet) considered here. Nevertheless the transient period is of outmost importance and deserves a careful investigation.

2.1 Terminology

Few comments about the terminology used in the report:

CPU is the raw power expressed in SpecInt95 * second. No assumption is made on the type of CPU, nor on the operating system or hardware surrounding it.

STORAGE is the total amount of disk space needed to store the relevant data of each phase and can be different from the amount to be actually read. It is assumed to be on disk. Note that, in addition to the mere amount, the optimal placement of such a large number of disks to optimize the input/output bandwidth is not obvious.

I/O is the bandwidth needed to read or write the data. The main flavors are local bus bandwidth (inside a computer) expressed in Megabytes per second and network bandwidth in bits per second. It is assumed that reasonable bandwidth may be 20 MB/s sustained on local bus, 100 Mb/s (12 MB/s) on a LAN. On wide area network, bandwidth is a matter of cost.

3. Analysis Models.

Three main phases with a simple but tight fixed scheduling are supposed:

All the 10^9 events are analyzed 1 time per month. The events are selected using 0.25 SpecInt95*sec/ev. The sample is reduced by a factor 100, to 10^7 events. Requested response time is 3 days. This phase is done 1 time for each analysis.
The specific computation needed for each different analysis is done and the results saved. The CPU time is evaluated to 2.5 SpecInt95*sec/ev. Response time 1 day. Each analysis group accomplishes this phase immediately after the previous one is completed. Thus, at any given time, only 1 analysis group is active in this phase.
Each physicist analyses "interactively" his sample with a response time of 4 hours; CPU needed 2.5 SpecInt95*sec/ev.

Note that the scheduling model assumes that the response times for phase 1 and 2 are 3 and 1 day respectively, only when the phases are started according to a predetermined sequential order: there are analyses that start first and end first and other ones that start and end later; the "waiting time" DT between the first and last analysis depends from the level of distribution of a model and will be given for the various models considered.

Such a strong scheduling policy is probably not what the collaborations want, and in any case it would be difficult to implement; it is however a good comparison point as far as required resources are concerned. In the initial stage it would be utterly impossible.

Table 1 summarizes the relevant assumptions.

Phase	Ev. size [KB]	Event Number reduction	Event Size reduction	Events/ year	Requested Response time	SpecInt95 *sec /event	Total ev. size [GB]
RAW	1000			10^9			10^6
ESD	100		10%	10^9	ON LINE	250	10^5
1	100	1 %		10^7	3 DAYS (1)	0.25	10^3
2	10		10%	10^7	1 DAY (1)	25	10^2
O	10	10 %		10^6	< 1 DAY	nihil	10
3	10			10^7 - 10^6	4 HOURS	2.5	100 - 10

1 - Only one analysis active at the same time

Table 1 - Main model parameters

3.1 Different versions of the analysis phases

The analysis model is not completely specified by the phases sketched above, and some additional assumptions may be made:

Phase 2 can have AOD as input or ESD in an earlier time of the experiment.
Phase 2 can have as output AOD (enriched and analysis specific) or "ntuples" reduced in size by a factor 10.
Phase 3 can be preceded by a size reduction to "ntuples" if phase 2 gives AOD as output.
Phase 2 can be executed by each analyst separately giving private output (model A) or executed in a concerted way by all the physicists in the same group on the same center, giving the same agreed output for all of them (model A-R, A Revised). A-R looks rather realistic especially if AOD output is assumed.
Phase 3 can be preceded by a further selection of a factor 10 in the number of events reducing the number of events to 10^6 (model O).
Phase 3 can be done with data and CPU located in a center or with data and CPU located in each institute.

For the CPU aspects only the variations considered in the last 3 points will be taken into account, giving rise to models A, A-R, O.
As for as the volume of data is concerned we consider AOD as input and output format of phase-2 and "ntuple" as input format for phase-3.
The "ntuple" is simply a subsample of the analysis specific AOD produced in phase-2, so no extra CPU is required for getting them from the DB.

3.2 Levels of distribution

We compare 4 scenarios

CENTRAL: the 3 phases are all performed in a unique center (the same where reconstruction is done).
Note that in this scenario, to complete the 20 phases-1 of the 20 groups in 1 month, 3 phases-1 have to run concurrently (allowing for 33% inefficiency).
RC: the 3 phases are performed in the 8 RC's.
LOCAL-3: phases 1 and 2 are central, while phase 3 is local.
RC-LOCAL-3: phases 1 and 2 are done in RC, while phase 3 is local.

See fig. 1 for a flow diagram of the various possibilities.

3.3 Storage and network considerations

In all the scenarios considered a copy of the ESD (100TB/year) is kept in each of the available centers. Each analysis is done on 100GB of AOD (=10^7 ev x 10^4 Bytes) in models A, A-R; in model O only 10GB are used.

Table 2 reports the ratio between time spent in reading and in processing an event (sum of reading, processing and the usually negligible writing time). It is obvious that, to optimize the resources, the ratio has to be small. The rather crude approximation shows nevertheless that a problem exists in phase 1.

Maximum care has to be taken to have a parallel reading and processing and that the I/O time to read and write an event is not greater than the time to process it.

Note also that, although the initial event pool corresponds to one year, the storage to keep the output from the various phases is computed per one month only. It is unrealistically assumed that after one month data is scratched and the same storage may be reused. Multiplying the result a factor of three could be more realistic.

Throughout this exercise, data is supposed to be distributed evenly on the servers, so that access timing and bandwidth can be computed easily. This is also an optimistic hypothesis that has to be modified in a more detailed simulation.

Phase	Ev. Size [KB]	Read time [s] at 20 MB/s R	SpecInt95*sec /event	Time to process one event on a 100 SpecInt95*sec CPU P	% of time spent in reading R/(R+P)
	100	0.005	250	2.5	0.2
1	100	0.005	0.25	0.0025	66
2	10	0.0005	25	0.25	0.2
O	10		nihil
3	10	0.0005	2.5	2.5	0.02

Table 2: comparison between the time needed to read
one event and the time to process it

As for as the network is concerned the only moving around of data on WAN is foreseen during the analysis is the transfer of "ntuples" before the local phase-3. Assuming 30 Mbit/sec lines, the transfer of 10GB at 1/3 occupation rate would take (10x10^9x8x)/(3x 10x10^6)=8x10^3 seconds i.e. a couple of hours; quite an acceptable time. However the number of concurrent transfers is much bigger than one. Also a large number of concurrent remote X session may require a significant bandwidth.

In the LOCAL-3 scenario each month the 25 physicists of an analysis group transfer in 1 day (maximum allowed time for avoiding superposition with other analysis groups and for keeping in step with the phase-2) 10GB each. Allowing for a maximum transfer time of 1 day at 1/3 line occupancy we need an out-capacity of (25x10x10^9x8x3)/10^5=60 Mbit/sec. Here transferring the AOD instead would require 600 Mbit/sec, an amount already somewhat critical.

In the RC-LOCAL-3 scenario for each center the 12.5 physicists of an analysis group transfer 10GB each: an out-capacity of 30 Mbit/sec is required (300 for AOD transfer). The line leading to a single institute should allow for the concurrent transfers by e.g. 4 physicists requiring a capacity of 10 Mbit/sec for "ntuple" and 100 for AOD transfer.

Thus the local scenarios requiring the transfer of 10^7 events in "ntuple" format are well within the expected network performances; the performances required for AOD transfer are less obvious, albeit still reasonable.

3.4 Results

The detailed computation in each case is reported in appendix A. The results of the models are summarized in the following and shown in the figures 2, 3 and 4.

At this stage of the study only preliminary results may be reached about the I/O and networking aspects of phase 1 and 2. Note that RC models have greater flexibility; the variation (DT) in waiting time between the first and last scheduled analysis is 30 days both in central and RC models; but the addition of small amounts of the total CPU in phase 1 bring easily the Central DT to 20 days and the RC DT to only 5 days. This kind of RC advantages has a cost; however the results summarized below show that models exist for which the extra cost can be kept at the level of a low fraction of the total one.

3.6 Model summary

MODEL A. For the analysis model A, the CENTRAL scenario is convenient, the RC too, provided the use of same machines for phase-2 and 3 can be scheduled. LOCAL-3 is disfavored; RC-LOCAL-3 can be secluded as too costly.

Model A-R For the analysis model A-R, the CENTRAL and RC scenario are both convenient.
Note that the R version of phase-2 may be somewhat unrealistic in CENTRAL, as it requires general analysis format agreement between far institutes. It is however could also be considered more realistic in RC where it accommodates naturally 2 different analysis formats for each group. Intermediate models with 3-5 analysis format (institute formats?)

The only disadvantage of both LOCAL scenarios is the unbalance of the CPU distribution toward the institutes; to such an extent it does not seem realistic and lacks of flexibility. This kind of objection would not be valid if virtual RC would be considered possible and convenient: more information on distributed DB performance is needed for addressing this point.

Model O. For the analysis model O, the CENTRAL scenario is convenient. RC is very disfavored because of the excess CPU of phase-2, which in this model is by far dominant, and so cannot be taken by the machines of another phase (as it was done by phase-3 machines in model A).

4. Conclusions and future work.

The computation shown in this report has to be taken as a first step to probe the complexity of the engineering of the final computing model.

The model detailed is nevertheless enough to show the unaivoidability of the presence of a scheduling policy and the need for a detailed discussion on the architecture. A discussion on tagging and analysis definitions would shade light on important starting options.

As an example, the unbalanced ratio between I/O and CPU power in phase 1 could point to avoiding this kind of phase-1, and rather change it in scope or merge with the previous (or following) phase.

The model A-Revised seems viable for both the Central and RC scenarios, with or without LOCAL option.

It should be noted that the assumption of a strong sequential scheduling is too constraining and hardly feasible already in a RC with only 5 analyses: in a unique center with 20 analyses seems utterly unrealistic. On the other hand it is obvious that the time-uniform query distribution assumed here cannot be obtained without a kind of "militaristic" scheduling policy. The worsening in response times due to a random time distribution of the queries will be the subject for some of the forthcoming simulation work. It may however be expected that the impact will be more severe in the centralized scenario where the probability of clashes between the different analyses is higher.

It should be also noted that even a random query distribution is not realistic: a serious scheduling policy has to be designed and simulated for coping with the foreseeable peaking times in query requests.

As far as the storage is concerned, it is clear that the disk space has to be connected to computers either of phase 1 or of phase 2. Each approach has consequences of moving the I/O bandwidth from local bus to the network during the other phase.

This study has to be pursued simulating specific architectures and taking into account the database performances.

The behavior of the database affects many aspects of the model. Information need to be acquired on:

performance (mean response time and variance as a function of load, CPU load, maximum number of simultaneous users, and so on)
possible levels of distribution and replication
the amount of intelligence in the database itself (to move data around at night according to usage profiles

The work of detailing the analysis models has to continue: in a next phase the aspects of MC simulation of the physics channels of interest and of the relevant backgrounds have to be added, and the effects of a "continuum" of random queries on a limited amount of data (for analysis setting up and debugging) has to be taken into account. A model simulation program is needed for this purpose; one might however start with a fairly simple one (this is true also for studying some of the randomness and scheduling effects mentioned above).

A detailed model has to be elaborated also for the initial stage of the LHC analysis, when the main concurrent activities will probably be the understanding of the data, with the inherent detector behavior, and the setting up of the analysis chains.

APPENDIX A

Detailed computations for

The different analysis models

In this naive model with fixed response times, the results on the basis of a scenario can be evaluated are two:

The total amount of CPU needed.
The time during which the CPU is idle.

It is evident that for a more realistic approach we would need a simulation program. The inputs to the simulation would be:

The boundary conditions set for the analysis model (number of users and analysis groups, structure of the analysis phases).
The physical architecture of the model (computing resources and their distribution.
The set of rules governing the scheduling of analysis jobs.

In this more realistic simulation the result would be the distribution of the response times and the different scenarios would be evaluated on the basis of:

The amount of resources needed.
The distribution of the response times.

Note that the mean response time here is not the only aspect to be taken into account: tails of long response times should be especially avoided.

Model A

CENTRAL

PHASE 1

CPU

0.25 SpecInt95*sec * 10^9 ev / (3x10^5 sec) = 800 SpecInt95*sec
800 x 3 (N.concurrent phases 1) = 2400 SpecInt95*sec
- 24 machines
If reading and processing are performed in parallel, then only 12 CPU's are needed)
- idle time 33%

STORAGE

INPUT: storage is 10^5 GB ( - 1000 disks)

OUTPUT: as the process just tags the events, adding few bytes, no additional storage is needed

I/O

READ: 10^5 GB/(3*10^5 sec) = 333 MB -> 333/24 MB/machine = 14 MB/machine if a safety factor of 3 is assumed the request is 42 MB/s/machine which is reasonable if the disks are local, in case they are remote it translates to 333 Mbit/sec which is worrying
(Similar conclusions if one takes into account that the time devoted to reading is only half of the analysis time)

WRITE: just tagging, negligible if the database is smart enough

PHASE 2

CPU

25 physicists x 25 SpecInt95*sec x 10^7 ev / (10^5 sec) = 6.25x10^4 SpecInt95*sec
- 625 machines
- idle time 33%

STORAGE

INPUT: 10^5 GB ( - 1000 disks) (kept in phase 1 computers?)

OUTPUT: 10^7 ev.* 10^4 B * 500 physicist = 50.000 GB (500 disks)
(If data from last three month has to be kept = 1500 disks)

I/O

READ: reads all events tag header and all tagged events = 25 physicist * 10^4GB / (10^5 sec *20 groups as only one group is allowed to run) = 250 MB/s = 400 KB/machine (but 10 MB/s on each server)

WRITE: 25 physicists * 10^2 GB / 10^5 sec = 25MB/s

PHASE 3

CPU

150 physicists x 2.5 SpecInt95*sec x 10^7 ev / (1.5x10^4 sec) = 2.5x10^5 SpecInt95*sec
- 2500 machines
- this phase is accomplished in 1/6 of the day by 30% of all the physicists. The full 500 physicists ( 10/3 of the active ones ) will occupy the machines for (1/6 x 10/3) = 55%: idle time = 45%

STORAGE

INPUT: storage is 5*10^4 GB ( - 500 disks) kept in PHASE 2 machines

OUTPUT: 25 MB / month * 500 physicist = 13 GB

I/O

READ: 150 simultaneous transfer * 10 or 100 GB / 2 10^4 sec = 75/750 MB/s = 600/6000 Mb/s. Even if only one tenth does the transfer at the same time it is a very high bandwidth, the time constraint has to be relaxed by at least a factor 5 (1 days)

WRITE: negligible

SumCPU (1+2+3) = 315 kSpecInt95*sec (SumCPU is CPU summed over the 3 phases)
TOTAL CPU required could be only 250 kSpecInt95*sec if phase-2 can be done during phase-3 idle time. However it looks difficult in this scenario.

SumSTORAGE(1+2+3) = 1600 GB (2200 if 3 month of data retention is requested).

RC

PHASE 1

CPU

0.25 SpecInt95*sec x 10^9 ev / (3x10^5) = 800 SpecInt95*sec

- 8 machines
- idle time = 50% (3 busy days x 5 groups = 50% of month)
- CPU8 = 6.4 kSpecInt95*sec (CPU8 is the sum over the 8 RC's )

STORAGE

INPUT: storage is local = 10^5 GB ( - 1000 disks)

OUTPUT: as the process just tags the events, adding few bytes, no additional storage is needed

I/O

WRITE: just tagging, negligible if the database is smart enough

PHASE 2

CPU

12.5 physicists x 25 SpecInt95*sec x 10^7 ev / (10^5 sec) = 3.1 x 10^4 SpecInt95*sec

- 310 machines
- idle time = 83% (busy for 1 day x 5 groups = 5/30)
- CPU8 = 248 kSpecInt95*sec

STORAGE

INPUT: 10^5 GB ( - 1000 disks) kept in ???????????

OUTPUT: 10^7 ev.* 10^4 B * 125 physicist = 12.500 GB (125 disks)

I/O

READ: reads all events tag header and all tagged events = 25 physicist * 10^4GB / (10^5 sec *20 groups as only one group is allowed to run) = 250 MB/s = 400 KB/machine (but 10 MB/s on each server)

WRITE: 25 physicists * 10^2 GB / 10^5 sec = 25MB/s

PHASE 3

CPU

5 physicists x 5 groups x 2.5 SpecInt95*sec x 10^7 ev / (1.5x10^4 sec) = 4x10^4 SpecInt95*sec

- 400 machines
- idle time = 58% . This phase is accomplished in 1/6 of the day by 40% of all the physicists using the RC: the busy time is ( 1/6 x 10/4 ) = 10/24 = 42%
- CPU8 = 320 kSpecInt95*sec

STORAGE

INPUT: storage is 12500 GB ( - 125 disks) kept in PHASE 2 machines

OUTPUT: 25 MB / month * 125 physicist = 2 GB

I/O

READ: 25 physicist * 10 or 100 GB / 2 10^4 sec = 125 MB/s

WRITE: negligible

SumCPU8 = 576 kSpecInt95*sec
- TOTAL CPU required could be only 326 kSpecInt95*sec if the machines of phase-3 are used for phases-2 in their idle time. In this scenario it seems rather logical to perform phase-2 during week-ends.
SumStorage8 = 9000 disks

LOCAL-3

CPU

The CPU of phase-3 is the same as in the RC case ( the activity ratio is the same ). The 3200 machines of phase-3 go to the institutes.
- 6.4 local machines/physicist
- idle time for phase-3 = 50% (about)
The big idle time of the local machines ( 58% ) can be used for MC. If the estimation 25 KSpecInt95*sec for MC is confirmed, it uses however less than 10% of this phase-3 CPU power

STORAGE

INPUT: storage is 12500 GB ( - 125 disks) kept in PHASE 2 machines

OUTPUT: is local (5000 GB, ntuples only = 50 disks)

I/O

READ: needed bandwidth to read the ntuples (about 400 Mb/s) at the source

WRITE: negligible

TOTAL CPU required = 375 kSpecInt95*sec
TOTAL STORAGE required = 1650 disks

RC-LOCAL-3

The local situation is as already sketched for LOCAL-3.
However no trade-off is possible here between the CPU of phases 2 and 3.
- 6.4 local machines/physicist
- idle time for phase-3 = 50% (about)

TOTAL CPU required = 576 kSpecInt95*sec
TOTAL STORAGE required = 9100 disks

Model A summary

Thus for this analysis model A, the CENTRAL scenario is convenient, the RC too, provided the use of same machines for phase-2 and 3 can be scheduled. LOCAL-3 is disfavored, RC-LOCAL-3 can be secluded as too costly.

Model A-R

CENTRAL

PHASE 1 (as central)

CPU

0.25 SpecInt95*sec * 10^9 ev / (3x10^5 sec) = 800 SpecInt95*sec
800 x 3 (N.concurrent phases 1) = 2400 SpecInt95*sec
- 24 machines
- idle time 33%

STORAGE

INPUT: storage is 10^5 GB ( - 1000 disks)

OUTPUT: as the process just tags the events, adding few bytes, no additional storage is needed

I/O

READ: 10^5 GB/(3*10^5 sec) = 333 MB -> 333/24 MB/machine = 14 MB/machine if a safety factor of 3 is as

WRITE: just tagging, negligible if the database is smart enough

PHASE 2: Here comes the big difference with model A : instead of all the 25 physicists of the
group we have just 1 on behalf of the full group:

CPU

1 group x 25 SpecInt95*sec x 10^7 ev / (10^5 sec) = 2.5x10^3 SpecInt95*sec
- 25 machines
- idle time 33%

STORAGE

INPUT: 10^5 GB ( - 1000 disks) kept in phase 1 computers;

OUTPUT: 10^7 ev.* 10^4 B * 20 groups = 2.000 GB (20 disks)

I/O

READ: reads all events tag header and all tagged events = 10^5GB / (10^5 sec *10) (10 is a reduction factor due to reading of just tags and not whole events) = 100 MB/s = 4 Mb/s /machine (but >10 MB/s on each server)

WRITE: 1 group * 10^2 GB / 10^5 sec = 1MB/s

PHASE 3

CPU

STORAGE

INPUT: storage is 5*10^4 GB ( - 500 disks) kept in PHASE 2 machines

OUTPUT: 25 MB / month * 500 physicist = 13 GB

I/O

READ: 125 physicist * 10 or 100 GB / 2 10^4 sec = 625 MB/s

WRITE: negligible

SumCPU(1+2+3) = 255 kSpecInt95*sec (SumCPU is CPU summed over the 3 phases)
TOTAL CPU required 255 kSpecInt95*sec fully dominated by phase-3.

RC

PHASE 1

CPU

0.25 SpecInt95*sec x 10^9 ev / (3x10^5) = 800 SpecInt95*sec

- 8 machines
- idle time = 50% (3 busy days x 5 groups = 50% of month)
- CPU8 = 6.4 kSpecInt95*sec (CPU8 is the sum over the 8 RC's )

STORAGE

INPUT: storage is local = 10^5 GB ( - 1000 disks)

OUTPUT: as the process just tags the events, adding few bytes, no additional storage is needed

I/O

READ: 10^5 GB/(3*10^5 sec) = 333 MB -> 333/24 MB/machine = 14 MB/machine if a safety factor

of 3 is assumed the request is 42 MB/s/machine which is reasonable if the disks are local, in case they are remote it translates to 333 Mbit/sec which is worrying

WRITE: just tagging, negligible if the database is smart enough

PHASE 2 Here comes the big difference with model A : instead of all the 12.5 physicists of the group we have just 1 on behalf of the full group:

CPU

1 group x 25 SpecInt95*sec x 10^7 ev / (10^5 sec) = 2.5 x 10^3 SpecInt95*sec

- 25 machines
- idle time = 83% ( busy for 1 day x 5 groups = 5/30 )
- CPU8 = 20 kSpecInt95*sec

STORAGE

INPUT: 10^5 GB ( - 1000 disks) kept in ???????????

OUTPUT: 10^7 ev.* 10^4 B * 125 physicist = 12.500 GB (125 disks)

I/O

READ: reads all events tag header and all tagged events = 25 physicist * 10^4GB / (10^5 sec *20 groups as only one group is allowed to run) = 250 MB/s = 400 KB/machine (but 10 MB/s on each server)

WRITE: 25 physicists * 10^2 GB / 10^5 sec = 25MB/s

PHASE 3

CPU

5 physicists x 5 groups x 2.5 SpecInt95*sec x 10^7 ev / (1.5x10^4 sec) = 4x10^4 SpecInt95*sec

- 400 machines
- idle time = 58% . This phase is accomplished in 1/6 of the day by 40% of all the physicists using the RC: the busy time is ( 1/6 x 10/4 ) = 10/24 = 42%
- CPU8 = 320 kSpecInt95*sec

STORAGE

INPUT: storage is 12500 GB ( - 125 disks) kept in PHASE 2 machines

OUTPUT: 25 MB / month * 125 physicist = 2 GB

I/O

READ: 25 physicist * 10 or 100 GB / 2 10^4 sec = 125 MB/s

WRITE: negligible

- SumCPU8 = 348 kSpecInt95*sec
- TOTAL CPU required could be only 326 kSpecInt95*sec if the machines of phase-3 are used for the other phases in their idle time; the difference however is not so big as the CPU is dominated by phase-3.

LOCAL-3

No change n phase-3 respect to model A.

- TOTAL CPU required = 325 kSpecInt95*sec

RC-LOCAL-3

No change in phase-3 respect to model A.

The local situation is as already sketched for LOCAL-3.
However no trade-off is possible here between the CPU of phases 2 and 3.
- 6.4 local machines/physicist
- idle time for phase-3 = 50% (about)

- TOTAL CPU required = 348 kSpecInt95*sec