14
Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)

Panda Grid Status

Embed Size (px)

DESCRIPTION

Panda Grid Status. Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz). Central services, LDAP, DB and ML transfers. Phone meeting on 1st Feb 2012 Till end of February GRID management center has to be moved out of Glasgow, including: - PowerPoint PPT Presentation

Citation preview

Page 1: Panda Grid Status

Panda Grid Status

Kilian Schwarz, GSIon behalf of

PANDA GRID Group

(slides to a large extend from Radoslaw Karabowicz)

Page 2: Panda Grid Status

Central services, LDAP, DB and ML transfersPhone meeting on 1st Feb 2012

Till end of February GRID management center has to be moved out of Glasgow, including:

Lightweight Directory Access Protocol (LDAP) -> GSI

MySQL DataBases (DB) -> GSI, Torino

Alien2 Central Services (CS) -> GSI

PANDA GRID MonaLisa (ML) -> Jülich

Page 3: Panda Grid Status

Panda GRID @ GSICentral Services installation status after the May Panda GRID meeting:

Lightweight Directory Access Protocol (LDAP) -> GSI

MySQL DataBases (DB) -> GSI, Torino

AliEn2 Central Services (CS) -> GSI

PANDA GRID MonaLisa (ML) -> Jülich / Torino

Recent changes of AliEn required direct interventions of the CERN people to our MySQL and our machine settings - still working to bring the Panda GRID back

Page 4: Panda Grid Status

Panda GRID Map~12 sites

~1400 CPUs

SC, LDAP, DB in GSI

Page 5: Panda Grid Status

Jobs share+------------+--------+| status | jobs |+------------+--------+| DONE | 204271 || DONE_WARN | 4833 || ERROR_E | 11026 || ERROR_IB | 1931 || ERROR_RE | 14766 || ERROR_SV | 14273 || ERROR_V | 59 || EXPIRED | 6338 || INTERRUPTE | 31 || OVER_WAITI | 1408 || SAVED | 338 |+------------+--------++------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+| site | jobs | DONE | ERROR | WAIT | STARTED | RUNNING | SAVE | ZOMBIE | OTHER |+------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+| | 1573 | 0 | 0 | 0 | 0 | 0 | 0 | 165 | 1408 || PANDA::Bucharest::panda01 | 31141 | 25978 | 4892 | 0 | 0 | 0 | 0 | 271 | 0 || PANDA::Dubna::pbs | 9570 | 8212 | 251 | 0 | 0 | 0 | 69 | 1038 | 0 || PANDA::GSI::lxgrid8 | 88322 | 74471 | 12005 | 0 | 0 | 0 | 0 | 1815 | 31 || PANDA::Juelich::ce642 | 1382 | 1201 | 169 | 0 | 0 | 0 | 0 | 12 | 0 || PANDA::KVI::PBS | 36445 | 32052 | 3784 | 0 | 0 | 0 | 242 | 367 | 0 || PANDA::Mainz::himster | 64449 | 47635 | 14444 | 0 | 0 | 0 | 0 | 2370 | 0 || PANDA::Torino::CREAM | 9414 | 8502 | 758 | 0 | 0 | 0 | 0 | 154 | 0 || PANDA::Torino::PBS | 3963 | 2686 | 1276 | 0 | 0 | 0 | 0 | 1 | 0 || PANDA::Vienna::smigrid02 | 9123 | 8367 | 584 | 0 | 0 | 0 | 27 | 145 | 0 |+------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+TOTAL NUMBER OF JOBS IN THE LAST 6 MONTH:+--------+| 259274 |+--------+

Because of the database changes the information about old jobs is accessible only from the MySQL,and is not available from Monalisa.Also, the job counter started from 0 again.

Page 6: Panda Grid Status

PandaRoot @ GRIDInstalled: Installed:

panda_extern: apr08, panda_extern: apr08, jul08, jul09, may11, jul08, jul09, may11,

jan12 jan12pandaroot: may11, pandaroot: may11,

july11,august11 nov11, july11,august11 nov11, stable, trunk (updated stable, trunk (updated

every Tuesday with every Tuesday with results published in results published in pandaroot cdash) pandaroot cdash)

Page 7: Panda Grid Status

GRID Disk Usage

Page 8: Panda Grid Status

needed

more GRID users

and we have to regain the users trust after a longer period of only partial functionality

http://panda-wiki.gsi.de/cgi-bin/view/Computing/PandaGridAliEn2ClientInstall

more sites

http://panda-wiki.gsi.de/cgi-bin/view/Computing/PandaGridAliEn2SiteInstall

GRID developers

Page 9: Panda Grid Status

ALICE & PANDAThe PANDA-ALICE relationship:

we use middleware written by ALICE

we have our own requirements and requests

we are supposed to give back:

allocate dedicated manpower for middleware development and user support

manpower will come also via LSDMA

develop in-house expertise with this middleware, and not only as users

debug and develop AliEn: Oracle Interface, Slurm Interface, PoD interface, VO-VO interface

PANDA uses already AliEn v2-20 and is debugging this for ALICE

Page 10: Panda Grid Status

Issuesmasterjob –printsite does not work

fquota does not work properly for many users

“services” command not working

packman install –everywhere does not work

job triggered installation is not sufficient for PANDA since we compile on site

AliEn installer installation works only with manual fixes (Gnu.so ...)

masterSE replicate

Page 11: Panda Grid Status

Issues #2

some sites still do not take jobs

Deletion of files

inter site data transfer/mirror

ROOT API

packages list in ML

activation of backup DB

Page 12: Panda Grid Status

wish list

• JAliEn

• To be able to install specific revision number via AliEn installer

Page 13: Panda Grid Status

Plans

PANDA wants to do a large scaleproduction at the beginning of next year.Up to then everything has to be fixed.

Page 14: Panda Grid Status

conclusion• ALICE/FAIR collaboration also in context of Grid

computing works quite well

• Still there is room for improvement

• PANDA can not be beta tester within its production environment

• common testbed maintained by ALICE and PANDA ?

• information flow needs to be improved. We can not always be taken by surprise if there is some majore change in the AliEn DB

• how to solve all the existing issues ? Currently we put them all in the GSI ticketing system. Who is responsible for what ?