MONARC note n. 2/98 - version 1.1
Some successful 100 MB/s writing tests with Objectivity have already been done [3], though this was on a supercomputer with a naive data model. The CERES/NA45 experiment achieved 20 MB/s in a CDR test using 2 2-processor Sun 450 systems, with the bottleneck in the PC's sending the data [8]. Several experiments are gearing up to do data taking into Objectivity, so we expect more reports to appear in the near future.
Though we discuss CPU requirements in SpecInt95 units in this note, it should be stressed that we do not know whether integer calculation performance is the major constraining factor when Objectivity uses the CPU in writing. It is possible that other things like bus speed or CPU cache size are more important. There is of course some correlation across systems between these latter performance factors and the SpecInt95 rating.
When Objectivity has to allocate a new object on a database page, it makes a lot of sense for it to spend some CPU power in searching for database pages which are not completely full, and on which a the new object would fit. The more pages are searched for holes, the more compact the database can be. There is thus a natural tradeoff between the CPU power spent looking for space and the space overhead in the database. It is possible that there are undocumented Objectivity parameters which would allow the user to tune this tradeoff. In that case, tests of writing on a CPU-constrained testbed system could trade away storage efficiency in order to gain speed. We do not know what fraction of CPU power is currently spent in looking for free space while writing. Other tasks like keeping track of write locks and dirty pages could also consume significant CPU power per object.
Tests found that the CPU power needed for writing, say, one MB depends strongly on
Tests [3] have shown that the number of parallel writers does not significantly change the CPU requirements in a single writer. In other words, if one writer needs X MIPS to write at Y MB/s, then N writers need N*X MIPS to write at N*Y MB/s. This assumes of course that the system databus and I/O bus can handle N*Y MB/s of traffic. We do not currently know whether this is the case for COTS systems with many CPUs.
Below we show a typical bandwidth vs. object size graph, for a single Objectivity client running as the only process on a system. This graph was adapted from [1], see [1] for more details on the test setup. The 'write object' curve shows the effective write rate of 'real data', the 'write page' curve shows the rate at which 'real data + database space overhead' is written.
The graph also shows the difficulty in determining the CPU requirements for the writing of big objects. In this particular test, writing is disk-bound for all object sizes larger than 8 KB: the 'write page' curve hits the bandwidth ceiling of the disk hardware.
Typical object size vs. performance graph, adapted from [1].
suncmsb = suncmsb.cern.ch, SunOs 5.5.1, Ultra Enterprise, 6 * 167 MHz UltraSPARC wnt = Windows NT 4.0, 1 * 233 Mhz Pentium 2 sunatl = SunOS 5.5.1, Ultra-5_10, 270 Mhz UltraSPARC exemplar = neptune.caltech.cacr.edu, HP Exemplar, 256 * PA8000 180? Mhz sunna45 = 2 * Sun 450, each 2 * 300 mhz ???SPARC.
+------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+ |Object| Platform |V|SI95/|MB/s/|CPUs/ |SI95/ |Max |Max |Page|ooRefs/|Ref| |Size | | |CPU |CPU |100MB/s|100MB/s|#CPUs |MB/s |Size|object | | +------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+ |100b | suncmsb |4| 6 | 2.2 | 45 | 270 | 1 | - | 8K | 1 |[4]| |100b | suncmsb? |4| 6 | 1.5 | 66 | 400 | 1 | - | 8K?| ? |[2]| |100b | wnt |5| 9 | 2.2 | 45 | 410 | 1 | - | 8K | 0 |[1]| |100b | sunatl |5| 8.5 | 2.0 | 50 | 430 | 1 | - | 8K | 0 |[1]| | | | | | | | | | | | | | | 1K | suncmsb |4| 6 | 4.5 | 22 | 130 | 1 | - | 8K | 1 |[4]| | 1K | suncmsb? |4| 6 | 3.2 | 31 | 190 | 1 | - | 8K?| ? |[2]| | 1K | wnt |5| 9 | 3.2 | 31 | 280 | 1 | - | 8K | 0 |[1]| | 1K | sunatl |5| 8.5 | 4.0 | 25 | 210 | 1 | - | 8K | 0 |[1]| | | | | | | | | | | | | | | 4K | suncmsb |4| 6 | 5.2 | 19 | 120 | 1 | - | 8K | 1 |[4]| | 4K | suncmsb? |4| 6 | 3.7 | 27 | 160 | 1 | - | 8K?| ? |[2]| | 4K | sunatl |5| 8.5 | 5.0 | 20 | 170 | 1 | - | 8K | 0 |[1]| | | | | | | | | | | | | | | 8K | sunatl |5| 8.5 | 5.5 | 18 | 150 | 1 | - | 8K | 0 |[1]| | 8K | exemplar |4| 11 | 4.5*| 22* | 240* | 30* | 140 |32K | 0 |[3]| | 8K | exemplar |4| 11 | 4.3 | 23 | 260 | 25 | 107 |32K | 0 |[5]| | | | | | | | | | | | | | |3.8M+ | sunna45 |5| 10 | 5 # |<=20 # |<=200 #| 4 # | 20 #| 8K | 1+1 |[8]| | 1K # | | | | | | | | | | | | +------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+
* This test actually had 100 processes which each spent about 70% of their CPU time in other calculations. Figures scaled to account for this. # Two objects were written per event, one 1K object and one 3.8 MB object. The bottleneck in the CERES/NA45 setup was in the 4 PCs supplying the data to the 2 Suns. The Sun CPUs used for object formatting were not saturated during the test. The figures therefore only represent upper bounds for the CPU requirements with these object sizes.
Summary of results from various tests
Note that we only have a few tests for 8KB objects, because on most test platforms writing of 8 KB objects is disk-bound rather than CPU-bound.
Data models with a substantial number of objects smaller than 100 bytes are inadvisable, both because of CPU requirements and because of storage overheads.
For models with object sizes in the range of 100 bytes - 1 K, hardware parameters should be >=60 CPUs and >=400 SpecInt95 if these is to be a good chance at achieving a 100 MB/s writing speed.
For models with object sizes in the range 4 K - 8 K, hardware parameters should be >=30 CPUs and >=200 SpecInt95 if these is to be a good chance at achieving a 100 MB/s writing speed.
For models in which almost all objects are >8K, the CPU requirements will be lower, but we do not have enough data to estimate how much lower. A model which is very rich in object references will have CPU requirements which are higher than those listed above, but but we do not have enough data to estimate how much higher.
It should be stressed that the currently available test data does not include test loads with, say, 6 writers on a 6-CPU COTS server. It is unclear whether the internal memory or I/O busses in such a server would prove to be a bottleneck which prevents a 6-fold speedup over using one CPU. There is some anecdotal evidence of slowdowns related to paging if the swap disk on a system is slower than its filesystem disks.
Because of the CPU requirements, a realistic 100 MB/s testbed build out of COTS components will inevitably consist of a number of loosely coupled systems each taking care of a 'slice' of the workload. Even with high-end COTS systems, there will be at least 3-4 slices. It seems attractive to built testbeds consisting of a single slice only as a way to save hardware resources. In any case it should be considered that, according to recent projections, in 2005 CPUs will be about 20-70 times more powerful per $ [10]-[11] and about 5 times as powerful per unit [10].
[1] Martin Schaller. Objectivity/DB Read/WritePerf, Sep. 98. http://wwwinfo.cern.ch/~schaller/Notes/WriteReadPerf.ps [2] Vincenzo Innocente. CMS analysis chain prototype. (Slides 5 and 6) RD45 workshop 7/97 http://home.cern.ch/~innocent/rd45/tex/ooprot_rd45797.ps.gz [3] K. Holtman, J. Bunn, Scalability to Hundreds of Clients in HEP Object Databases, Proc. of CHEP '98, Chicago, USA. http://home.cern.ch/~kholtman/chep2/art2web.html [4] Koen Holtman, Clustering strategies, July 1997, RD45 workshop (Slide 5) http://home.cern.ch/~kholtman/cluster_jul1997.ps [5] Koen Holtman. Exemplar performance test 'fill.ex6'. Unpublished. [6] Bernd Panzer-Steindel, Offline Computing and Central Data Recording Models for the Experiment COMPASS, section on Processor Benchmarks. http://wwwinfo.cern.ch/pdp/pa/compass/compass_evaluation/node11.html [7] Koen Holtman. Analysis of 'batch reclustering' tests. Unpublished. [8] Andreas Pfeiffer. CERES/NA45 CDR. Talk at RD45 October 1998 workshop. http://wwwinfo.cern.ch/asd/cernlib/rd45/workshops/oct98/presentations/na45/index.htm [9] Using an object database and mass storage system for physics analysis. CERN/LHCC 97-9, The RD45 collaboration, 15 April 1997. Section 11.2. http://wwwinfo.cern.ch/asd/rd45/reports/m3_96/milestone_3.htm [10] Pasta - The LHC Technology Tracking Team for Processors, Memory, Architectures, Storage and Tapes, Status Report - August 1996. http://wwwinfo.cern.ch/di/pasta.html [11] CMS Computing Technical Proposal. CERN/LHCC 96-45, CMS collaboration, 19 December 1996. Section 3.2.1.