10
LHCOPN Update John Shade /CERN IT- CS September 2010 GDB

LHCOPN Update

  • Upload
    portia

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

September 2010 GDB. LHCOPN Update. John Shade /CERN IT-CS. Working Groups. LHCOPN Operations and Monitoring WGs F2F LHCOPN meetings London 8/9 March ( http:// indico.cern.ch/conferenceDisplay.py?confId =80755 ) - PowerPoint PPT Presentation

Citation preview

Page 1: LHCOPN Update

LHCOPN UpdateJohn Shade /CERN IT-CS

September 2010 GDB

Page 2: LHCOPN Update

J. Shade/GDB LHCOPN Update 2

• LHCOPN Operations and Monitoring WGsF2F LHCOPN meetings

• London 8/9 March (http://indico.cern.ch/conferenceDisplay.py?confId=80755)• Barcelona 28/29 June (http://indico.cern.ch/conferenceDisplay.py?confId=88698)

Operations WG• Quarterly phone conferences

– Track correlation between outages and GGUS tickets

Monitoring WG• Conference calls in May/June, numerous e-mail exchanges

– perfSONAR MDM setup & deployment– LHCOPN Dashboard design

Mailing list: [email protected]

Working Groups

08-SEP-2010

Page 3: LHCOPN Update

J. Shade/GDB LHCOPN Update 3

• Working with DANTE to get a robust MDM solution in place (perfSONAR rollout had stalled)

• Clarified how to access performance data, and defined requirements for a dashboard for visualisation:See https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG

• Comments on Requirements document are still welcome!

Monitoring

08-SEP-2010

Page 4: LHCOPN Update

J. Shade/GDB LHCOPN Update 4

• Missing a central view of LHCOPN

• Weathermap and e2emon applications restricted to GEANT portal

• HADES data:Bandwidth Test Control /

Achievable Bandwidth (BWCTL, automated 1Gbit/s TCP Bandwidth Control Test)

One Way Delay (OWD) One Way Delay Variance /

Jitter (OWDV) Packet loss Traceroute (number of hops

between two Hades nodes) Duplicate packets Out of order packets

Monitoring

08-SEP-2010

Page 5: LHCOPN Update

J. Shade/GDB LHCOPN Update 5

a) Site status is up when OWD between +/-15% from baseline and packet loss less than 0.1% per five minutes

b) Site status is down when packet loss = 100% per five minutes

c) Site status is degraded when measurement values are between a) and b).

Initial (simple) algorithm

08-SEP-2010

Page 6: LHCOPN Update

J. Shade/GDB LHCOPN Update 6

Prototype Dashboard

08-SEP-2010

Page 7: LHCOPN Update

J. Shade/GDB LHCOPN Update 7

Prototype Dashboard

08-SEP-2010

Page 8: LHCOPN Update

J. Shade/GDB LHCOPN Update 8

• DANTE baulked at the idea of developing their prototype further and supporting it

• SARA and CERN have picked up the gauntletSARA developers have tested XML query/responses

against the central HADES repository at DFNTOM team leader is evaluating how best to

develop/integrate the LHCOPN dashboard• Sites already have local monitoring, but we

need to provide a central view!• Nagios probes for sites are also expected

Where do we go from here?

08-SEP-2010

Page 9: LHCOPN Update

J. Shade/GDB LHCOPN Update 9

• Next F2F LHCOPN meeting will take place at CERN on 7th-8th OctoberAgenda: http://

indico.cern.ch/conferenceDisplay.py?confId=102716

Includes participants from Internet2, DANTE, T1s etc.Topics to be covered include:

• Tier2 Connectivity Requirements• Service Level Definition• GGUS• Monitoring• Operations

Upcoming Events

08-SEP-2010

Page 10: LHCOPN Update

J. Shade/GDB LHCOPN Update 10