Status of the BaBar Databases

1/29

Status of the BaBar DatabasesStatus of the BaBar Databases

Jacek Becla

BaBar Database Group

2/29

BaBar Is in Production

Run 1: May 1999 – Oct 2000– ~24.2 fb-1 (~1.3 per month)

Run 2: Feb 2001 – July 2002– up to 12.6 fb-1 now (~2.5 per month)

Expected ~100 fb-1 by July 2002– already well over designed luminosity

3/29

Prognosis

FY00 FY01 FY02 FY03 FY04 FY05

Peak luminosity

[1033 cm-2 sec-1] 2.5 5 8 10 13 24

Yearly integrated luminosity

[fb-1]

25 40 80 115 135 225

Total Integrated luminosity

[fb-1]25 65 145 260 395 620

4/29

Changes 4 -> 21 streams

– >5 times more files, locks– no data duplication (streams not self-contained)

Smaller files– 2 -> 0.5, 10 -> 2 [GB]

Using Objy 6.1, read only dbs Clustering hint server and cond OID server Migrating production to Linux (now) Introducing multi-fds (now) Cannot afford a large test-bed anymore

5/29

OPR

In general keeps up with data– ~150 pb-1 per day– faster than at the end of Run 1

in spite of 5x load

– will have to deal with 300 pb-1 soon

6/29

Current OPR Configuration

Hardware– 6 4-CPU data servers, lock server, jnl

server, catalog server, clustering hint server + conditions OID server

– 220 clients

Software– Objy 6.1, Solaris 7– about to migrate to Linux

7/29

OPR – Short Term Future Use multi-fds

– 2 event store fds, 1 conditions– 6 + 6 data servers– new federation approx. every week

Migrate clients to Linux– 2.2 faster CPU, more memory

Use faster machine for lock servers– now: Sun Netra T1, 440 MHz– planned: Sun Blade 1000, 750 MHz UltraSPARC-3

Discussions about storing all digis in objy, and reprocessing from Objy, not xtc

8/29

REPRO

Hardware configuration similar to OPR

Occasionally up to 3 repro farms– over 300 pb-1 on a good day

150+150+200 nodes

– condition merging nightmare

9/29

REPRO – Near Future

Use multi-fds– 2 event store fds, 1 conditions– 5 + 5 data servers– new federation ~ every other week– same slow lock servers

Move to Linux

Run in Italy. Timescale ~mid 2002

10/29

Robustness Db creation (weak point) removed

– precreation in background by CHS, automatic recovery, new C++ api in 6.0

AMS crash– ¾ of the farm continues, unless it is a “default” AMS

(used by CHS)

CHS – new central point of failure– entirely in our hands, very stabile so far

One event store fd down (e.g. lock server crash)– the second should finish processing current run

Cleanup server – worked on

11/29

12/29

13/29

Analysis

200 CPUs (~Sun Netra T1 like) 17 servers, 24 TB disk cache On demand staging turned off Read only dbs

– starting to see effect now

Disk space – always a problem– micro – 5.4 KB/event (aod, col, tag, evt, evshdr)– mini – 4.7 KB/event (esd)

14/29

Analysis – cont…

Veritas File System reconfiguration– direct I/O instead of buffered I/O

more than doubles effective data rate

Lock server memory leak– grows up to 600 MB in a week– switching every week

Kanga (ROOT based) will become deprecated– recent computing model: enhance Objy, deprecate kanga

(freeze by Mid 2002, produce files till late 2002)

15/29

AMS

Known (but not fixed) problem– file used immediately after being closed– crashes AMS (in 6.1 kills the client)

Ported to Linux– no performance figures yet

New feature - compression Redesigning front end part

– got ok from Objy

16/29

A Word on Conditions

Using OID server to find time interval– only in REPRO so far, about to put in OPR

Staircase problem– incorrect design– purging every 2 weeks, ~15 min per rolling

calibration (35 in total), run in parallel

Finalize problem– based on genealogy object, (all objects named),

result of iteration in unpredicted order. Just slow

Condition merging problem

17/29

Conditions…cont Index problem

– occasionally index inconsistent (does not return all objects in given range). Solution – rebuild. Happens ~once every 2 months. Not reported yet.

Index scaling– range query (the way we use it) does not scale

response time linear (100 K entries -> 0.5 sec)

Will extend OID server – now read only access

Will redesign & re-implement conditions– and address all the problems, timescale: end of 01

18/29

Data Distribution

Micro-level data mirrored @ in2p3 Run2 – mirror raw as well Current tools do not scale with

increased data volume– a lot of manual work

Will try using data grid based tools soon

19/29

Operations

2 DBAs +3rd coming soon Many manual tasks slowly being

automated

20/29

Some Numbers

Total size of data – 300+ TB

# files – 128K

# users in analysis ~220

10 active production federations– this includes 5 analysis fds

Cond dbs – 12 GB

21/29

TuningPerformance

Scalability

22/29

8 Hz

160 nodes run, 20 streams, with duplication

420 Streams Was Non-trivial

4 streams: 100 nodes: ~ 60 Hz200 nodes: ~115 Hz

23/29

Clustering Hint Server CORBA based, multithreaded Precreates in background dbs and conts,

distributes oid to clients Many other features:

– containers reused– full integration with HPSS (precreated files pinned

in cache, full dbs immediately migrated)– file disparsification

file transfer to tape: 1MB -> 15-25MB now

– db creation locally, pre-sizing no container extensions on the client side

– round robin load balancing– automatic recovery, and so on

24/29

Others commitAndHold

– significant reduction in lock traffic

Initial transaction for condition– one instead of 50 transactions

Cache authorization– rather then check on every event

Tune # client file descriptor limit– Hit 8K limit on AMS site. Reduced client fd limit:

196 -> 32. AMS response improved, AMS CPU usage decreased

Increase trans granularity

25/29

Bottlenecks

Lock server– 1st signs of saturation: with ~ 200 nodes– use faster CPU– use Objy 7 (33% lock traffic reduction)

scheduled for October 2001

– more event store fds per farm

CPU on data servers– buy more – expensive– improve AMS, reduce event size

26/29

Use Faster CPU…

27/29

Miscellaneous

64 K pages?– unfortunately not working with multi-fds

Maybe precreate/purge dbs only in between runs?

David is stepping down as a head of the BaBar DB group

28/29

Future Looks Bright

Lock server bottleneck– multi-fds – can always add one

more event store fd– Objy 7 will feature faster lock server– CPUs are getting faster

Data server CPU saturation– AMS redesign should help– size of event (rec) being reduced now by ~10%,

looking for more– can always buy more servers

29/29

Summary

No serious problems– conditions need to be redesigned

Likely OPR will keep up

Working in the BaBar DB group is fun!

Documents

Status of the BaBar Databases