MONARC note n. 2/98 - version 1.1

CPU requirements for 100 MB/s writing with Objectivity

Koen Holtman, CERN/CMS , 26 Nov 1998

This note discusses the probable requirements on CPU power for a 100 MB/s writing test using Objectivity/DB. It collects and summarises the results of several tests performed in the context of the RD45 collaboration. This note was created at the request of MONARC.

Some successful 100 MB/s writing tests with Objectivity have already been done [3], though this was on a supercomputer with a naive data model. The CERES/NA45 experiment achieved 20 MB/s in a CDR test using 2 2-processor Sun 450 systems, with the bottleneck in the PC's sending the data [8]. Several experiments are gearing up to do data taking into Objectivity, so we expect more reports to appear in the near future.

Though we discuss CPU requirements in SpecInt95 units in this note, it should be stressed that we do not know whether integer calculation performance is the major constraining factor when Objectivity uses the CPU in writing. It is possible that other things like bus speed or CPU cache size are more important. There is of course some correlation across systems between these latter performance factors and the SpecInt95 rating.

Factors which determine the CPU usage of Objectivity

Tests [1] [2] [4] have shown that Objectivity/DB uses a significant amount of CPU power when writing objects. This is true for writing when creating new objects, but also for writing when copying existing objects [4]. When reading objects, much less CPU power is used.

When Objectivity has to allocate a new object on a database page, it makes a lot of sense for it to spend some CPU power in searching for database pages which are not completely full, and on which a the new object would fit. The more pages are searched for holes, the more compact the database can be. There is thus a natural tradeoff between the CPU power spent looking for space and the space overhead in the database. It is possible that there are undocumented Objectivity parameters which would allow the user to tune this tradeoff. In that case, tests of writing on a CPU-constrained testbed system could trade away storage efficiency in order to gain speed. We do not know what fraction of CPU power is currently spent in looking for free space while writing. Other tasks like keeping track of write locks and dirty pages could also consume significant CPU power per object.

Tests found that the CPU power needed for writing, say, one MB depends strongly on

Overall, this means that object model chosen for the writing test will have a significant impact on CPU requirements. The database page size may also be an important factor but less is known about the dependency on page size. If containers are reasonably large and transactions are reasonably long, the CPU time spent on container and transaction creation will be small compared to that spent on object creation.

Tests [3] have shown that the number of parallel writers does not significantly change the CPU requirements in a single writer. In other words, if one writer needs X MIPS to write at Y MB/s, then N writers need N*X MIPS to write at N*Y MB/s. This assumes of course that the system databus and I/O bus can handle N*Y MB/s of traffic. We do not currently know whether this is the case for COTS systems with many CPUs.

Dependence on object size

In the tests we know of [1] [2] [4] [9] the write performance for some object size S was always measured by running a transaction filling a single container with a large number of objects of size S. There is some evidence [7] that filling multiple containers in parallel does not seriously change the CPU requirements per object.

Below we show a typical bandwidth vs. object size graph, for a single Objectivity client running as the only process on a system. This graph was adapted from [1], see [1] for more details on the test setup. The 'write object' curve shows the effective write rate of 'real data', the 'write page' curve shows the rate at which 'real data + database space overhead' is written.

The graph also shows the difficulty in determining the CPU requirements for the writing of big objects. In this particular test, writing is disk-bound for all object sizes larger than 8 KB: the 'write page' curve hits the bandwidth ceiling of the disk hardware.

Typical object size vs. performance graph, adapted from [1].

Results from different CPU-bound performance tests

The table below summarises the results from several write performance tests. For all results, the writing was still CPU bound and not yet disk bound. The columns in the table are as follows:

Object size:
Size of the objects written. In most tests object are slightly smaller than the indicated size, for example 1 KB objects are usually 1000 bytes only to make sure that 8 of them fit onto a 8192 byte page.
Platform:
Test platform:
  suncmsb  = suncmsb.cern.ch, 
             SunOs 5.5.1, Ultra Enterprise, 6 * 167 MHz UltraSPARC
  wnt      = Windows NT 4.0, 1 * 233 Mhz Pentium 2
  sunatl   = SunOS 5.5.1, Ultra-5_10, 270 Mhz UltraSPARC
  exemplar = neptune.caltech.cacr.edu, HP Exemplar, 256 * PA8000 180? Mhz
  sunna45  = 2 * Sun 450, each 2 * 300 mhz ???SPARC.
V:
Objectivity/DB version.
SI95/CPU:
SpecInt95 rating of a single CPU in the platform. These figures were guessed with the aid of [6].
MB/s/CPU:
Throughput for a single CPU in the test. This is MB/s of 'real data' written per second, not 'real data + database space overhead'.
CPUs/100MB/s:
Number of CPUs that would be needed to write at 100 MB/s. Calculated by dividing 100 MB by the MB/s/CPU value, so we assume no additional overhead if the number of CPUs increases [3].
SI95/100MB/s:
Number of SpecInt95s that would be needed to write at 100 MB/s. Calculated from SI95/CPU and CPUs/100MB/s.
Max #CPUs:
Maximum number of CPUs used concurrently in the test.
Max MB/s:
Maximum MB/s writing rate achieved in the test, if more than one CPU was used.
Page Size:
Objectivity/DB page size used in the test
ooRefs/object:
Object references created (in other objects) for each object written
Ref:
Reference to source of test result.
+------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+
|Object| Platform |V|SI95/|MB/s/|CPUs/  |SI95/  |Max   |Max  |Page|ooRefs/|Ref|
|Size  |          | |CPU  |CPU  |100MB/s|100MB/s|#CPUs |MB/s |Size|object |   |
+------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+
|100b  | suncmsb  |4| 6   | 2.2 | 45    | 270   |  1   | -   | 8K |  1    |[4]|
|100b  | suncmsb? |4| 6   | 1.5 | 66    | 400   |  1   | -   | 8K?|  ?    |[2]|
|100b  | wnt      |5| 9   | 2.2 | 45    | 410   |  1   | -   | 8K |  0    |[1]|
|100b  | sunatl   |5| 8.5 | 2.0 | 50    | 430   |  1   | -   | 8K |  0    |[1]|
|      |          | |     |     |       |       |      |     |    |       |   |
| 1K   | suncmsb  |4| 6   | 4.5 | 22    | 130   |  1   | -   | 8K |  1    |[4]|
| 1K   | suncmsb? |4| 6   | 3.2 | 31    | 190   |  1   | -   | 8K?|  ?    |[2]|
| 1K   | wnt      |5| 9   | 3.2 | 31    | 280   |  1   | -   | 8K |  0    |[1]|
| 1K   | sunatl   |5| 8.5 | 4.0 | 25    | 210   |  1   | -   | 8K |  0    |[1]|
|      |          | |     |     |       |       |      |     |    |       |   |
| 4K   | suncmsb  |4| 6   | 5.2 | 19    | 120   |  1   | -   | 8K |  1    |[4]|
| 4K   | suncmsb? |4| 6   | 3.7 | 27    | 160   |  1   | -   | 8K?|  ?    |[2]|
| 4K   | sunatl   |5| 8.5 | 5.0 | 20    | 170   |  1   | -   | 8K |  0    |[1]|
|      |          | |     |     |       |       |      |     |    |       |   |
| 8K   | sunatl   |5| 8.5 | 5.5 | 18    | 150   |  1   | -   | 8K |  0    |[1]|
| 8K   | exemplar |4| 11  | 4.5*| 22*   | 240*  |  30* | 140 |32K |  0    |[3]|
| 8K   | exemplar |4| 11  | 4.3 | 23    | 260   |  25  | 107 |32K |  0    |[5]|
|      |          | |     |     |       |       |      |     |    |       |   |
|3.8M+ | sunna45  |5| 10  | 5 # |<=20 # |<=200 #|  4 # | 20 #| 8K |  1+1  |[8]|
| 1K # |          | |     |     |       |       |      |     |    |       |   |
+------+----------+-+-----+-----+-------+-------+------+-----+----+-------+---+

  * This test actually had 100 processes which each spent about 70% of their
    CPU time in other calculations.  Figures scaled to account for this.

  # Two objects were written per event, one 1K object and one 3.8 MB object.
    The bottleneck in the CERES/NA45 setup was in the 4 PCs supplying the
    data to the 2 Suns. The Sun CPUs used for object formatting were not
    saturated during the test.  The figures therefore only represent
    upper bounds for the CPU requirements with these object sizes.

Summary of results from various tests

Note that we only have a few tests for 8KB objects, because on most test platforms writing of 8 KB objects is disk-bound rather than CPU-bound.

Conclusions

The CPU requirements for writing with Objectivity at 100 MB/s depend on the data model used, but they are always substantial.

Data models with a substantial number of objects smaller than 100 bytes are inadvisable, both because of CPU requirements and because of storage overheads.

For models with object sizes in the range of 100 bytes - 1 K, hardware parameters should be >=60 CPUs and >=400 SpecInt95 if these is to be a good chance at achieving a 100 MB/s writing speed.

For models with object sizes in the range 4 K - 8 K, hardware parameters should be >=30 CPUs and >=200 SpecInt95 if these is to be a good chance at achieving a 100 MB/s writing speed.

For models in which almost all objects are >8K, the CPU requirements will be lower, but we do not have enough data to estimate how much lower. A model which is very rich in object references will have CPU requirements which are higher than those listed above, but but we do not have enough data to estimate how much higher.

It should be stressed that the currently available test data does not include test loads with, say, 6 writers on a 6-CPU COTS server. It is unclear whether the internal memory or I/O busses in such a server would prove to be a bottleneck which prevents a 6-fold speedup over using one CPU. There is some anecdotal evidence of slowdowns related to paging if the swap disk on a system is slower than its filesystem disks.

Because of the CPU requirements, a realistic 100 MB/s testbed build out of COTS components will inevitably consist of a number of loosely coupled systems each taking care of a 'slice' of the workload. Even with high-end COTS systems, there will be at least 3-4 slices. It seems attractive to built testbeds consisting of a single slice only as a way to save hardware resources. In any case it should be considered that, according to recent projections, in 2005 CPUs will be about 20-70 times more powerful per $ [10]-[11] and about 5 times as powerful per unit [10].

Acknowledgements

We thank Martin Schaller and Andreas Pfeiffer for providing details about the tests in [1] and [8].

References

 [1] Martin Schaller. Objectivity/DB Read/WritePerf, Sep. 98. 
     http://wwwinfo.cern.ch/~schaller/Notes/WriteReadPerf.ps
    
 [2] Vincenzo Innocente. CMS analysis chain prototype. (Slides 5 and 6)
     RD45 workshop 7/97
     http://home.cern.ch/~innocent/rd45/tex/ooprot_rd45797.ps.gz
 
 [3] K. Holtman, J. Bunn, Scalability to Hundreds of Clients in HEP
     Object Databases, Proc. of CHEP '98, Chicago, USA.
     http://home.cern.ch/~kholtman/chep2/art2web.html
 
 [4] Koen Holtman, Clustering strategies, July 1997, RD45 workshop (Slide 5)
     http://home.cern.ch/~kholtman/cluster_jul1997.ps
 
 [5] Koen Holtman. Exemplar performance test 'fill.ex6'. Unpublished.
 
 [6] Bernd Panzer-Steindel, Offline Computing and Central Data
     Recording Models for the Experiment COMPASS, section on Processor
     Benchmarks.
     http://wwwinfo.cern.ch/pdp/pa/compass/compass_evaluation/node11.html
 
 [7] Koen Holtman. Analysis of 'batch reclustering' tests. Unpublished.
 
 [8] Andreas Pfeiffer. CERES/NA45 CDR.  Talk at RD45 October 1998 workshop.
     http://wwwinfo.cern.ch/asd/cernlib/rd45/workshops/oct98/presentations/na45/index.htm
 
 [9] Using an object database and mass storage system for
     physics analysis. CERN/LHCC 97-9, The RD45 collaboration, 15 April
     1997.  Section 11.2.
     http://wwwinfo.cern.ch/asd/rd45/reports/m3_96/milestone_3.htm
 
[10] Pasta - The LHC Technology Tracking Team for Processors, Memory,
     Architectures, Storage and Tapes, Status Report - August 1996.
     http://wwwinfo.cern.ch/di/pasta.html

[11] CMS Computing Technical Proposal. CERN/LHCC 96-45, CMS
     collaboration, 19 December 1996.  Section 3.2.1.