Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IceCube LiveStatus and Overview
John [email protected]
IceCube Live review, August 12, 2010
Review Materials: http://bit.ly/live-reviewI3Live Website: http://live.icecube.wisc.edu
Tuesday, August 10, 2010
IceCube Live
Background
Tuesday, August 10, 2010
IceCube Live
2005 2006 2007 2008 2009 20102010
“Portlets”
LBNL
Patton,
Stouffer, Day
“Roadrunner”
LBNL
Jacobsen
“Anvil”
LBNL
Patton, Day
“Anvil”
LBNL
Patton, Day
IceCube Live
Jacobsen et al
IceCube Live
Jacobsen et al
IceCube Live
Jacobsen et al
TestDAQ rDAQ pDAQpDAQpDAQpDAQpDAQ
SPADESPADE
PnF Plan BPnF Plan BPnF Plan B PnF Plan APnF Plan A
SN-DAQSN-DAQSN-DAQ
I3Moni
Verif.
ICL TemperaturesICL TemperaturesICL Temperatures
MeteorologyMeteorologyMeteorology
ITSITS
Meteor RadarMeteor Radar
OperationalIntegrated with Experiment
Control
Integrated with Experiment
Control
3
An abbreviated history of Experiment ControlIC9 IC22 IC40 IC59 IC79
Tuesday, August 10, 2010
IceCube Live
Challenges
• Remote site: design for low bandwidth• Distributed collaboration: use Web for maximum
accessibility• Low (hu)manpower and shifting requirements:
- leverage existing technologies/services- use Python- deliver code early and improve incrementally (Agile)
• Many, diverse subsystems:- provide plug-in architecture- help people to integrate their own systems
4
Tuesday, August 10, 2010
IceCube Live
Requirements
Tuesday, August 10, 2010
IceCube Live
Requirements? What requirements?No formal requirements document was ever made.... but the overall “vision” was sketched out early on:
6
https://docushare.icecube.wisc.edu/dsweb/Get/Document-45721/
Tuesday, August 10, 2010
IceCube Live
OK. There are some basic, high-level requirements:
• Show status of the detector, at Pole and NH:- current status, in near real time- historical status
• Allow operation of DAQ, including flashers• Allow operation of other subsystems• Alert operators when problems occur• Maximize uptime • Provide public interface for outreach (proposed
early on, now actively sought)• Do all the above securely
7
Tuesday, August 10, 2010
IceCube Live
1. Display Detector Status and History
1.a. Show current status in LiveView (!Status" page)1.a.i. Show current run status, including DAQ state, run number,
time elapsed in run, run configuration, DAQ release, event count, and number of active DOMs.
1.a.ii. Show current !light mode" (dark / LID)1.b. Provide command line tools for checking current status1.b.i. !livecmd check" shows current run, subrun, run configuration,
run start time, number of DAQ events, DAQ release, current light mode, and status of stand-alone components
1.c. Show detector history in LiveView1.c.i. Graph quantities of interest on !Status" page, over the past 24
hours: event rate, number of active DOMs, outside ambient temperature, CPU load on main experiment control machine, and deviation of current rate
1.c.ii. Provide !Recent" page showing detector run history1.c.ii.1. Present overview of recent runs in tabular format, showing run
number, start time, duration, trigger rate, “zombie” DOMs, run
configuration, pDAQ release, and light mode (LID/dark)
1.c.ii.2. Allow users to specify start and end times of run list (default is one
week), back to the beginning of 2007 (IC22)
1.c.ii.3. Show detector live time for selected runs
1.c.ii.4. Show missing runs and dead time (e.g., 27 minutes), highlighted as a
colored gap
1.c.ii.5. Allow operators to add and delete comments on individual runs. Run
comments are synchronized immediately between hemispheres via
ITS. Only operators can delete comments. !icecube" user can view
but cannot add comments.
1.c.ii.6. Interface with DAQ logs transmitted to the Data Warehouse to show
detailed DAQ log information for all available runs; link to DAQ
monitoring/logging files (with file sizes), Verification plots (with
abbreviated summary shown) and I3Moni analysis pages.
1.c.iii. Show global event rate history (!Rates" page)1.c.iii.1. Show three different time scales (last two hours, last 24 hours, last 60
days)
1.c.iv. Show detector uptime (!Uptime" page)1.c.iv.1. Provide year-by-year graphs of monthly uptimes going
back to 2007 (IC22)1.c.v. Provide alert history1.c.v.1. Show “ticker-like” graphical history snapshot over 24 hours on main
(status) page
1.c.v.2. Show detailed alert history on separate !alerts" page
1.c.v.3. Show complete triggering history of each individual alert on its own
page, as well as who is to be notified by email when the alert is
triggered and whether the alert sends pages to the Winter-Overs
1.c.vi. Provide paginated, low-level view of individual monitored quantities
1.c.vi.1. Implement search by service, variable name, variable type, priority;
allow wildcards for service and variable name
1.c.vii. Provide paginated view of log messages sent by LiveControl as well as pDAQ, SNDAQ and other subsystems, color-coded by subsystem
1.d. Show current UTC time on all pages2. Control Data Acquisition and Stand-alone Systems
2.a. Enable control of detector via LiveView or !livecmd"2.a.i. Start/stop runs, specifying duration, number of runs, run
configuration, filtering options, and one or more optional run comments
2.a.ii. Multiple DAQs (!synchronous components") supported (e.g. TWR, Radio, ...)
2.a.iii. Scriptable interface via !livecmd," with options to block (wait) until run finishes (used for DAQ testing and flasher operation)
2.a.iv. Perform single run, a specified number of runs, or run indefinitely. Attempt to recover and restart runs up to eight times in case of DAQ error
2.b. Allow for control of non-DAQ components2.b.i. Components which run independently of DAQ can be started
or stopped, or reset in case of error2.c. Enable control of !light mode" (dark / LID) via LiveView or !livecmd
lightmode"2.c.i. When light mode changes, all light-sensitive components (e.g.
DAQ) are stopped and restarted for clean separation of “dark” and LID running periods
2.c.ii. Indicate light mode status via XML-RPC function, as well as via a disk file on the Experiment Control node. Transmit light mode changes to LiveView
2.d. Enable flasher operations via !livecmd flasher" command
3. Alert triggering and transmission features (alert history features described above)
3.a. LiveControl alerts3.a.i. Provide facility for alerting when individual runs fail, and when
runs repeatedly fail (i.e. the detector cannot be restarted)3.a.ii. Allow users to define new alerts based on minimum or
maximum values received from monitoring stream3.a.ii.1. Implement Schmitt trigger on same to prevent flapping
3.a.iii. Alerts can be configured to email one or more recipients when triggered
3.a.iv. Alerts can be configured to page the Winter-Overs3.a.v. Persist LiveControl alert settings so that they come into effect
every time LiveControl is restarted3.a.vi. Alerts are transmitted to LiveView3.b. Implement Subsystem Alerts3.b.i. Individual subsystems can send alerts at will, optionally
notifying recipients by email, with relevant payload data included in the alert.
3.b.ii. Alerts are transmitted to LiveView3.c. Implement Northern Data Flow Alert3.c.i. Cron job in North verifies data is flowing through Live and
emails the current Live administrator if there is an outage3.c.ii. When data stops flowing, clearly indicate that fact on the main
!Status" page of LiveView4. System Integration Features (features implemented in IceCube Live
core system in support of other systems; see also “Infrastructure Features,” below)
4.a. Integrate with IceCube Online Database (“I3OmDb”)4.a.i. Update run_summary table with run start/stop times and DAQ
event counts, continuously as data arrives from DAQ4.a.ii. Update flasher configuration tables when flasher subruns are
started4.a.iii. Transmit SQL query statements to North via SPADE for
synchronization with Northern database4.b. Implement flexible plug-in architecture for control and monitoring
of subsystems4.b.i. Components can be brought in !hot" or removed at any time
via the !livecmd control" and !livecmd ingore" commands
4.b.ii. Components can run on any machine inside the SPS network
4.b.iii. Components can send monitoring data to IceCube Live for alerting and/or for transmission to the LiveView Web Sites
4.b.iii.1. Support basic scalar data types (int, float, str, None)
4.b.iii.2. Support structured data types (JSON)
5. Subsystem-Specific Features
5.a. pDAQ (does not include the DAQ functions described above)5.a.i. Monitor rate deviation5.a.i.1. Collect current event counts from pDAQ in real time via monitoring
stream
5.a.i.2. Calculate fractional deviation from the current rate versus the last
twenty-four hours
5.a.i.3. Alert when rate deviation exceeds 10%, except during the first 15
minutes of runs
5.a.i.4. Show alerts and graph of rate deviation on main Status page in
LiveView
5.b. !SNDAQ" 5.b.i. Start/stop component via !livecmd" or LiveView5.b.ii. !SNDAQ" page in LiveView5.b.ii.1. Show current status
5.b.ii.2. Show recent alarms (highlight alarms less than 24 hours old in red)
5.b.ii.3. Show machine-specific performance metrics for sps-2ndbuild
5.b.ii.4. Show tabular summaries of recent indicators of SNDAQ performance
5.c. !ICL/B2" Monitoring Page5.c.i. Show graphs of minimum and maximum temperatures as
measured by ICL “weather geese”5.c.ii. Show humidity in B2 science lab and air flow readout from
Goose 25.c.iii. Show historical minima and maxima for all quantities. 5.c.iv. Generate alert when temperatures go out of range.5.d. PnF5.d.i. Start/stop/recover nine subsystems via !livecmd" or LiveView5.d.ii. (Detailed PnF Displays in LiveView are in progress)
5.g.i. Start/stop seven I3Moni components via LiveView or !livecmd"5.g.ii. Display graphs and statistics for current disk usage on sps-
i3moni5.g.iii. (Full integration of I3Moni Web site is in progress)5.h. Weather5.h.i. Show graphs and current values for South Pole temperature,
pressure, wind speed and direction5.i. Meteor Radar5.i.i. Show graphs of Meteor Radar signal strength as picked up by
the DOM !Discworld", over two hour, 24 hour and 7 day timescales.
6. Infrastructure Features
6.a. Transmit data between detector/test systems and LiveView Web sites at South Pole and Madison
6.a.i. Make use of ITS (northbound & southbound), SPADE email and SCP queues (northbound only) and direct TCP/IP (local only) when available
6.a.ii. Allow messages to be sent according to 4 “priority” levels: 6.a.ii.1. 1 (ITS) ~2 minute latency
6.a.ii.2. 2 (SPADE email) ~5 minute latency
6.a.ii.3. 3 (SPADE SCP) ~0-12 hour latency
6.a.ii.4. 4 (TCP/IP direct only) immediate
6.a.iii. Higher priority messages are copied to lower priority queues for redundancy
6.a.iv. Bandwidth limits apply to each queue6.a.iv.1. When bandwidth is saturated, the “noisiest” services lose data first
6.b. Implement hooks for custom data flows6.b.i. “Filewatcher” program scans for incoming files of interest in
Data Warehouse (DW), can be extended to look for new file types/locations
6.b.ii. “DBServer” program receives incoming data stored in DW or transmitted directly over TCP/IP or ITS. A handler for structured JSON data can be added for special handling of incoming data (e.g. for directly notifying a service running in Madison)
6.c. Implement direct access to LiveView Database6.c.i. Django admin panel allows for administrative users (e.g. Run
Coordinator) to browse and edit data in database6.c.ii. Direct, read-only MySQL access to most tables available to
users on UW cluster6.d. Multi-cluster support for LiveView6.d.i. Both SPS and SPTS clusters supported on production server
in Madison6.d.ii. !Localhost" cluster used for end-to-end testing on e.g. laptops6.d.iii. Cluster selectable via !Comms->Settings" menu option;
connection testable via ITS to South Pole or directly via TCP/IP to SPTS
6.e. “Messages” page in LiveView shows control messages sent to/from LiveView with alert, control, and run annotation information
6.f. “Stats” page in LiveView shows communications and database statistics and status of Data Warehouse files used for Live
6.g. “Chat” feature allows users to communicate via ITS if the Jabber chat room is not working.
6.h. Show IceCube Live user status and history in LiveView6.h.i. Show current users on status page6.h.ii. !User" page6.h.ii.1. Currently active users are highlighted
6.h.ii.2. !Operators" can see privileges of other users
6.h.iii. Detailed user info pages show login history for individual users
6.i. Online Documentation6.i.i. !Overview" page6.i.ii. !Installation" how-to6.i.iii. !Monitoring and alerts" how-to6.i.iv. !Subsystem control" how-to6.i.v. Developers guidelines6.i.vi. Advanced topics6.i.vii. Supporting infrastructure6.i.viii. User manual (PDF) for winter-over scientists and other
detector operators6.j. Repeatable, turn-key installation procedure for Web servers
(kickstart based)6.j.i. Includes database import, software dependencies, Python 2.6
installation, security settings, Apache configuration, SSL
Feature List
~90 features. Full list on DocuSnare: https://docushare.icecube.wisc.edu/dsweb/Get/Document-55522/
Categories:
8
1. Display Detector Status and History2. Control Data Acquisition and Stand-alone
Systems3. Alert triggering and transmission features4. System Integration Features 5. Subsystem-Specific Features6. Infrastructure Features7. Security Features8. Public Outreach Features
Examples:5.g.i Show graphs and current values for South Pole temperature, pressure, wind speed and direction1.c.ii.3 Show live time for selected runs on ‘Recent’ page7.d User logins timeout after 24 hours of inactivity
Tuesday, August 10, 2010
IceCube Live
Design
Tuesday, August 10, 2010
IceCube Live
Design - Overview
10
LiveControl control point for subsystems• Simple interfaces for control,
monitoring and logging• Generates/transmits alerts
LiveView Web-based UI• Transmits control requests to
LiveControl• Show current state and history
LiveCmd command-line interface• For direct access/special
operations over SSH
Status Control
LiveView
Operator
Alerts
LiveControl
DAQ
TWR
PnF
GRB
SN
SPS
...
LiveCmd
Operator
Tuesday, August 10, 2010
IceCube Live
SPS
sps-expcont
ITS
Alerts
Alerts
SPS Web Server
pDAQ, other subsystems
LiveControl
B2 Science/Winter Overs
NH Web Server
DBServer
LiveView
MySQL +
Django ORM
Collaboration
Data Warehouse
SPADE
File
Wa
tch
er
SPADE
DBServer
File
Watc
her
MySQL +
Django ORM
LiveView
Disk Cache
disk
cache
disk
cache
System Diagram
11
Tuesday, August 10, 2010
IceCube Live
Hardware
12
Where Names Function Model CPU Cores RAM (GB)
Disk (GB)
SPS sps-expcont*LiveControl, pDAQ, DB scripts, ...
HP DL385/G1
AMD Opteron 2.4 GHz
2x2 8 300
SPS sps-i3livelive.icecube.usap.gov LiveView HP DL380/
G5Intel
Xeon 3 GHz
2x4 16 400
222 Cygnus live.icecube.wisc.edu LiveView HP DL385/
G1AMD
Opteron 2.4 GHz
2x4 8 300
Hardware configurations to change in ’10-’11!!!
(*) Duplicated at SPTS
Tuesday, August 10, 2010
IceCube Live
LiveView Implementation Stack
13
JQuery JavaScript LibraryJQuery JavaScript LibraryYour BrowserYour Browser
Apache Web server + mod_wsgiApache Web server + mod_wsgi
Django Web FrameworkDjango Web Framework
PythonMySQLdb
Python ... other python libraries....
MySQLMySQLLinuxLinux
(LiveControl and LiveCmd:100% pure Python...)
Tuesday, August 10, 2010
IceCube Live
System Dependencies
14
LiveControl
LiveView
LiveCmd
ITS
SPADE
D.W.
LDAPMySQL
Python 2.3
(2.6)virtualenv
distutils
textile
simplejson
MySQLdb
Django
...
pDAQ PnF SNDAQ ...
NFS (perot, for SPTS)
jQuery
I3OmDb
(browsers)softly depends on
firmly depends on
Apache
mod_wsgi
Tuesday, August 10, 2010
IceCube Live
pDAQ, SNDAQ PnF
ITS SPADE
Meteorology ICL Temperatures
Plug-in Integration of Sub-systems
15
Control
Logging
MonitoringAlerts
Custom display(s)
Tuesday, August 10, 2010
IceCube Live
Status
Tuesday, August 10, 2010
IceCube Live
Current Status
• Production use started April 1, 2009 (end of IC40)• 19 controllable services/components• 27 services sending monitoring data• Typical latencies ~ 5 min.• ~100k quantities transmitted/stored per day
• ~12 users per (week)day• 181 users to date
• 99.7% uptime of core control system
17
Tuesday, August 10, 2010
IceCube Live
Code
Tuesday, August 10, 2010
IceCube Live
Code Style Goals
- Modularity: reduce cross-dependencies- Testability: use doctests; see testing slide- Readability: use PEP-8 for new code & fixes- Releasability: only commit “working” code- Conciseness: KISS & DRY (maintainability)
See for yourself at http://code.icecube.wisc.edu/svn/projects/live/
19
Tuesday, August 10, 2010
IceCube Live
A Hierarchy of Testing:Level
Single moduleModels and URLs
Simple integrationEnd-to-end
QualificationProduction
20
HowPython ‘doctests’Django doctests
‘Livetests’Laptop simulation
+ browserSPTSSPS
For highest modularity, confidence and ease of debugging, ensure correctness at the lowest possible level.
Whencontinuously (TDD)before commitbefore commitas needed
before releaseafter deployment
Tuesday, August 10, 2010
IceCube Live
Release discipline- Revision tracking in SVN- Issue tracking in Mantis• “Issue” = well-defined & tracked change
to the code• 1 feature <=> 1-10 issues• 1 bug fix <=> 1-2 issues
- Roughly monthly releases• 18 so far (initial: v0.9.1 latest: v1.4.0)• As features/fixes needed at SPS• As resolved issues pile up in the codebase• See RELEASE_NOTES file in source tree....21
Tuesday, August 10, 2010
IceCube Live
Documentation
Control and Monitor your gigaton Antarctic neutrino detector
IceCube Live
By the IceCube Collaboration, R. Abbasi et. al.
Experiment Control
in near-real-time!
Tuesday, August 10, 2010
IceCube Live
IceCube Live online documentation*
https://live.icecube.wisc.edu/doc/
- Beginner and advanced sections- Developer how-to’s and guidelines- PDF Operator’s Document for WOs, experts
• Separate Wiki pages for IT staff for kickstarting Webservers• Documentation stored in SVN with I3Live source code • Documentation for non-operations physics users somewhat lacking?
(*) Documentation stored in SVN with I3Live source code 23
Tuesday, August 10, 2010
IceCube Live
Security (see also Paul W. talk)
Assets/steps:• Django security (autoescaping, XSS protection, SQL
sanitizing, ...) (See http://www.djangobook.com/en/2.0/chapter20/)
• LDAP/PAM authentication of users• HTTPS for all traffic • Use ‘pdaq’ group to define operator privileges• Limit connection between LiveView & LiveControl• Periodic vulnerability scans by UW and USAP• Keep system logs for > 5 months
Results:• Roughly 2 automated attacks/week (since Mar.’10)• No successful intrusions detected since first prototype
brought online, April 2008.24
Tuesday, August 10, 2010
IceCube Live
LiveView User Hierarchy1. unauthenticated user (redirect to login page)2. ‘icecube’ user (cannot add run comments)3. ‘normal’ user4. ‘operator’ (pdaq group) - can start/stop things and
delete run comments5. ‘Django superuser’ - can use Django admin panel
to edit LiveView database directly
25
Tuesday, August 10, 2010
IceCube Live
Data Accumulation (not including DW files)
26
Pruned and archived manually every few months(see https://live.icecube.wisc.edu/doc/infrastructure/)
April, 2008 - July, 2010Includes SPS & SPTS
Assume 10-fold increase in table sizes over lifetime of project (except moni/log).
Since 10-fold increase in moni tables alone does not bring system to its knees currently, I believe we are OK. Further pruning of e.g. Message or WarehouseFile tables should be possible.
27
Tuesday, August 10, 2010
IceCube Live
Review History
System Review: You’re in it!
Code Reviews:• Chrisopher Webber, a Chicago Django expert,
reviewed LiveView code.Suggestions:• Adhere to PEP-8 standard (adopted for new code)• Use South migration tool when changing DB
tables (now done)• Other aesthetic/organizational suggestions not
yet implemented.
27
Tuesday, August 10, 2010
IceCube Live
Subsystem Integration Status
pDAQ Dave G., Kael WorkingSN-DAQ Timo, JJ WorkingICL, B2 monitoring JJ WorkingWeather Victor WorkingMeteor Radar JJ/Dave WorkingITS JJ, Victor, ... WorkingPnF Torsten & Erik Needs display, filter ratesSPADE Victor PartialVerification James Pepper In progressI3Moni Kai Schatto In progressOptical Follow-Up Anna, Torsten, JJ StartedNagios IT group, Victor, ...? Needs decision
28
Tuesday, August 10, 2010
IceCube LiveRoom for improvement (see also Sebastian Böser’s talk)
Aesthetics - have not been a priority and could be improved
DAQ Error Handling - DAQ-side errors are fairly opaque and confusing for operators
Testability - there are many, many tests, but many core features, particularly in the Web UI, do not have automated tests
• Python test code: 2548• Overall Python: 19891 (13% ratio. Ideally closer to 50%)
Graphing - the graphs could be more complex, more informative, more attractive, more readable (in progress)
29
Tuesday, August 10, 2010
IceCube Live
Room for improvement, continued
Speed - some pages load slowly
Issue backlog - some 160+ issues remain in Mantis queue
....
Expertise & Manpower - this system, critical to detector performance and uptime, is used every day by the IceCube Collaboration, yet only one person has ‘enough’ expertise
30
Tuesday, August 10, 2010
IceCube Live
Effort
Tuesday, August 10, 2010
IceCube Live
Steady-State (hu)Manpower
Current level of effort:• 60% Jacobsen (also contributing to Operations, DAQ, ...)
• 25% Bittorf• ???% Frère• + subsystem integration work (Schatto, Pepper)
Current rate of implementation of new features is slow (requests arrive faster than they can be satisfied) -> Manage carefully (see Sebastian talk)
Best guess for long-term maintenance: at least 0.5 FTE (includes occasional feature additions based on science requirements and user requests).32
Tuesday, August 10, 2010
IceCube Live
Conclusion
Tuesday, August 10, 2010
IceCube Live
Conclusion
• The core IceCube Live system is implemented and has been in production since April 2009.
• Uptime of the control system is 99.7%• The system meets the minimum requirements for
operating and monitoring the detector• Most of the key subsystems have been fully or
partially integrated• There is opportunity for improvement in both the
core system and in subsystem integration• IceCube Live is a greatly improved Experiment
Control which cost the project roughly 1/4 of what previous versions cost.
34
Tuesday, August 10, 2010
IceCube Live
Spares
Tuesday, August 10, 2010
IceCube Live
Why so many Experiment Control systems?• 2005: Initial attempt ambitious but did not achieve
goals in time (IC09 startup)• 2006: Quick-and-dirty “pirate” implementation for
rDAQ/IC09• 2007: Third implementation (IC22, IC40)
functioned OK for the most part, but:- Status displays meagre- Only 2-3 systems integrated- Flasher implementation flawed- Robustness and maintenance issues
• The good news:Each implementation built on previous experience.36
Tuesday, August 10, 2010
IceCube Live
Steering Development
Multiple phases of development:
37
2007
2008
2009
2010
Vision documentInitial Prototyping“IceCube Live AdvisoryGroup”Iterative testing/development at PoleProduction Start (IC59)
Continued, iterative improvements
Features
Tuesday, August 10, 2010
IceCube Live
LiveControl Block Diagram
38
DAQ
LiveControl Threads
Initialization,Monitoring &
Alert Dispatch
SPADE
dropbox
DOMHubMonitor Interface
Idle DAQ monitor
Monitoring Listener
Alert Trigger
domhub
monitor
status file
ITS
LiveView
SPADELocal
Alert DBI3OmDb
Recent
Rates
DAQother
subsystems
WOs
(North)
domhub
monitor
DAQ runner
LID state transition
Message Handler LiveCmd
control
state
(North)
Tuesday, August 10, 2010
IceCube Live
Connecting LiveView <-> LiveControl
39
Modes of Connecting LiveView to LiveControl
ITS (Default NH)
live.icecube.wisc.edu
7000
its.icecube.wisc.edu
10801
its.icecube.southpole.usap.gov
10878
sps-expcont
7001
ITS-A (SPTS)
gull
7000
sps-expcont
7001
SPS-Direct(Default NPX)
live.icecube.wisc.edu
7000
spts64-expcont
7001
SPTS-Direct
localhost
7000
7001
Localhost
live.icecube.wisc.edu
7000
its-a.icecube.wisc.edu
10801
its-b.icecube.wisc.edu
10878
spts64-expcont
7001
View
Transport
Control
Tuesday, August 10, 2010
IceCube Live
Data Transport layer — Priority Queues
40
Priority Name (Transport) Latency Max. Data Rate
1 ITS ~2min 20 B/sec
2 SPADE email (Iridium) ~5min 1 kB/sec
3 SPADE SCP (TDRSS) 0-12 hrs 2.3 kB/sec
4 Direct (TCP/IP) < 1 sec > 1MB/sec
Features:- Priority 1 duplicated in streams P2 & P3; P2 duplicated in P3- Respect max. data rates -- discard “noisy” subsystems first if need be- Monitor transport statistics (‘Stats’) pageSee proposed requirements at https://docushare.icecube.wisc.edu/dsweb/Get/Document-48537/
Tuesday, August 10, 2010
IceCube Live
Example LiveCmd Operations on sps-expcont*
Check out code (e.g. on sps-expcont): $ svn co http://code.icecube.wisc.edu/svn/projects/live/trunk live
Install code: $ cd live && ./setup.py install
Stop DAQ:$ livecmd stop daq
(Re-)start LiveControl: $ livecmd launch
Start DAQ (default configuration): $ cd && ./starti3
Start individual component: $ livecmd start sndaq
* see also the Operator Documentation PDF41
Tuesday, August 10, 2010
IceCube Live
The Livetimes of Live
Different kinds of uptime:• LiveControl core system:
Crashes on 7/30/09 and 8/5/09. Root cause fixed.~1 day total down time. Uptime=99.7%
• Data Slowdowns (SPADE, ITS, I3Live issues):93% “prompt” uptime (June 2009 - July 2010), based on my email ‘forensics’
• LiveView Web site (hostage to DW, LDAP, ...)Don’t know uptime. Estimate > 90%.
42
Tuesday, August 10, 2010
IceCube Live
Contributors
~28k Lines of code: (committed to trunk)24330 jacobsen # JJ 1998 vbittorf # Victor 1498 enielsen # Tex 499 mfrere # Michael Frère 37 dglo # Dave G. 3 jpepper # James Pepper, UA
28365 TOTAL # ... only!!
649 Mantis issues completed:
43
michaelvictor
tex
jacobsen87%
jacobsen96%
Tuesday, August 10, 2010
IceCube Live
0
200
400
600
March , 2009 August , 2009 January , 2010 June , 2010
Releases
44 Resolved Mantis Issues
0.9.1, 3/26/09
1.4.0, 6/14/10
IC79 start
5/31/10
IC59 start
5/20/09
Tuesday, August 10, 2010
IceCube Live
Genesis (including design docs & reports)
• Discussion with Albrecht, 12-2007• “Vision” document, 12-07 https://docushare.icecube.wisc.edu/dsweb/Get/Document-45721/
• Prototyping and discussions at Pole, 1/08• Preliminary design document
https://docushare.icecube.wisc.edu/dsweb/Get/Document-45723
• Talks at Tuesday Call and Collab. Mtg. 3,4/08https://docushare.icecube.wisc.edu/dsweb/Get/Document-45839https://docushare.icecube.wisc.edu/dsweb/View/Collection-6147
• Development Phase, 2008Advisory Panel: Azriel, Martin, Erik, Timo, Kael
• SCAP 2009 status reporthttps://docushare.icecube.wisc.edu/dsweb/Get/Document-48979/
45
Tuesday, August 10, 2010
IceCube Live
Genesis, Contd.
• IC59 Run (2009). Anvil to I3Live transition 4/1/09• Status document circulated to Collaboration, 9/09
https://docushare.icecube.wisc.edu/dsweb/Get/Document-51130/
• Roundtable discussion, Berlin, 9/09https://docushare.icecube.wisc.edu/dsweb/Get/Document-51292/
• First draft of new feature schedule, Denise/JJ, 12/09http://docs.google.com/View?id=dgs8xkph_12fh9b82d4
• Roundtable discussion in Annapolis, May, 2010
46
Tuesday, August 10, 2010