Data, Data Everywhere
Why We Need Broadband Connectivity
By Ruzena Bajcsy
Who Generates the Data?
• Astronomers• Biologists• High Energy Physicists• Geophysicists• Archeologists and Anthropologists• Psychologists• Engineers• Artists
A Year of Innovation and A Year of Innovation and AccomplishmentAccomplishment
UC Santa Cruz
Center for Information Technology Research Center for Information Technology Research in the Interest of Societyin the Interest of Society
Solving SocietSolving Societal-Scale Problemsal-Scale Problems
Energy Conservation Emergency Response and
Homeland Defense Transportation Efficiency
Solving Societal-Scale ProblemsSolving Societal-Scale Problems
Monitoring Health Care Land and Environment Education
Societal-Scale SystemsSocietal-Scale Systems
““Client”Client”
““Server”Server”
Clusters
Massive Cluster
Gigabit Ethernet
Secure, non-stop utilitySecure, non-stop utilityDiverse componentsDiverse componentsAdapts to interfaces/usersAdapts to interfaces/usersAlways connectedAlways connected
MEMSMEMSSensorsSensors
Scalable, Reliable,Scalable, Reliable,Secure ServicesSecure Services
InformationInformationAppliancesAppliances
February 2000February 2001
February 2002August 2001
$8,000 each$8,000 each
Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: Before CITRISBefore CITRIS
Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: With CITRIS Wireless MotesWith CITRIS Wireless Motes
$70 each$70 each
Ad-hoc sensor networks work• 29 Palms Marine Base, March 2001
– 10 Motes dropped from an airplane landed, formed a wireless network, detected passing vehicles, and radioed information back
• Intel Developers Forum, Aug 2001– 800 Motes running TinyOS hidden
in auditorium seats started up and formed a wireless network as participants passed them around
• tinyos.millennium.berkeley.edu
Recent Progress:
Energy Efficiency and
Smart Buildings
Arens, Culler, Pister, Orens, Rabaey, Sastry
The Inelasticity of California’s Electrical Supply
0
100
200
300
400
500
600
700
800
20000 25000 30000 35000 40000 45000MW
$/M
Wh
Power-exchange market price for electricity versus load(California, Summer 2000)
How to Address the Inelasticity of the Supply
• Spread demand over time (or reduce peak)– Make cost of energy
• visible to end-user• function of load curve (e.g. hourly pricing)
– “demand-response” approach• Reduce average demand (demand side)
– Eliminate wasteful consumption– Improve efficiency of equipment and appliances
• Improve efficiency of generation and distribution network (supply side)
Enabled by Information!
Energy Consumption in Buildings (US 1997)
End Use Residential Commercial Space heating 6.7 2.0Space cooling 1.5 1.1Water heating 2.7 0.9Refrigerator/Freezer 1.7 0.6Lighting 1.1 3.8Cooking 0.6 -Clothes dryers 0.6 -Color TVs 0.8 -Ventilation/Furnace fans 0.4 0.6Office equipment - 1.4Miscellaneous 3.0 4.9Total 19.0 15.2
Source: Interlaboratory Working Group, 2000
(Units: quads per year = 1.05 EJ y-1)
A Three-Phase Approach• Phase 1: Passive Monitoring
– The availability of cheap, connected (wired or wireless) sensors makes it possible for the end-user to monitor energy-usage of buildings and individual appliances and act there-on.
– Primary feedback on usage– Monitor health of the system (30% inefficiency!)
• Phase 2: Quasi-Active Monitoring and Control– Combining the monitoring information with instantaneous feedback on the cost of usage
closes the feedback loop between end-user and supplier.
• Phase 3: Active Energy-Management through Feedback and Control—Smart Buildings and Intelligent Appliances– Adding instantaneous and distributed control functionality to the sensoring and monitoring
functions increases energy efficiency and user comfort
Cory Hall Energy Monitoring Network
50 nodes on 450 nodes on 4thth floor floor30 sec sampling30 sec sampling250K samples to database over 6 weeks250K samples to database over 6 weeksMoved to Intel Lab – come play!Moved to Intel Lab – come play!
Smart Buildings
Dense wireless network of Dense wireless network of sensor, control, andsensor, control, and
actuator nodesactuator nodes
• Task/ambient conditioning systems allow conditioning in small, localized zones, to be individually controlled by building occupants and environmental conditions
• Joint projects among BWRC/BSAC, Center for the Built Environment (CBE), IEOR, Intel Lab, LBNL
Control of HVAC systemsUnderfloor Air DistributionConventional Overhead System
Control of HVAC Systems
• Underfloor system can save energy because it can get hotter near ceiling
• Project with CBE (Arens, Federspiel)• Need temperature sensors at different heights• Simulation results
– Hot August day in Sacramento– Underfloor HVAC saves 46% of energy
• Future: test in instrumented room
More sensors – air velocity
• Uses time of flight of sound to determine 3D air velocity
• Significance – Heat transfer (energy)– Air quality– Perception of temperature
Smart Dust Goes NationalSmart Dust Goes National Academia: UCSD, UCLA, USC, MIT,
Rutgers, Dartmouth, U. Illinois UC, NCSA, U. Virginia, U. Washington, Ohio State
Industry: Intel, Crossbow, Bosch, Accenture, Mitre, Xerox PARC, Kestrel
Government: National Center of Supercomputing, Wright Patterson AFB
Why Broadband Connectivity When Memory Is So Cheap?
• Because users want to interact with the data in real time
• Users need to access the data at the right time and at the right place
• They need to access data in the right format• They want the right amount of data
Examples
• Distributed computation
• Cluster technology
• The Berkeley Millenium Project
Cluster Counts• NOW (circa 1994) 4proc HP ->36proc SPARC10 ->100proc Ultra1• Millennium Central Cluster (Intel Donation)
– 99 Dell 2300/6400/6450 Xeon Dual/Quad: 332 processors– Total: 211GB memory, 3TB disk– Myrinet 2000 + 1000Mb fiber ethernet
• OceanStore/ROC cluster, Astro cluster, Math cluster, Cory cluster, more
• CITRIS Pilot Cluster : 3/2002 deployment (Intel Donation)– 4 Dell Precision 730 Itanium Duals: 8 processors– Total: 20GB memory, 128GB disk– Myrinet 2000 + 1000Mb copper ethernet
Current Network
CITRIS Network Rollout
Network Rollout
• Millennium Cluster– Keep existing Nortel 1200/1100/8600– New Foundry FastIron 1500
• CITRIS Cluster– New Foundry FastIron 1500
• Backbone– 2 Foundry BigIron 8000
• Cost of expansion $280K (SimMillennium)
Millennium Cluster Tools
• Rootstock Installation• Ganglia Cluster Monitoring• gEXEC – remote execution/load balancing• Pcp – parallel copying/job staging
All in production, open source, cluster community development on sourceforge.net
Rootstock Installation Tool• Installation configuration
stored centrally• Build local cluster specific
root from central root• Install/reinstall cluster
nodes from local rootstock• http://rootstock.millenniu
m.berkeley.edu/
• Has become basis for http://rocks.npaci.edu/ cluster distribution.
Ganglia Monitoring• Coherent distributed hash of cluster information
– Static: cpu speed, total memory, software versions, boottime, upgradetime etc.– Dynamic: load, cpu idle, memory available, system clock, etc.– Heartbeat– Customizable with simple API for any other metric
• Data is exchanged in well defined XML and XDR• Lightweight – small memory footprint and minimal communication
(tunable).• Scalable – tested on several 512+ node clusters• Trusted hosts - feature allows clusters of clusters to be linked within a single
monitoring and execution domain.• Ported to Linux, FreeBSD, Solaris, AIX, and IRIX, +active development
by community for other ports• Dell Open Cluster Group seriously evaluating this as basis for their cluster
computing tool distribution. “The only monitoring that scales over 64 nodes”
gEXEC – remote execution• History
– Glunix from NOW– rEXEC from Millennium– gEXEC UCB/CalTech collaboration
• Lightweight – minimal number of threads on frontend + fanout• Decentralized – no central point of failure• Fault tolerant – fallback ability + failure checks at runtime• Interactive – feels like a single machine• Load balanced from Ganglia Monitoring data • Scalable to at least 512 nodes.• Unix authorization plus cluster keys
e.g. gexec –n 3 hostnamegexec –n 0 render –in input.${VNN} –out output.${VNN}
Pcp – parallel copy
• Newest addition to cluster suite• Fanout copy of files/directories to nodes• Scalable• Used for job staging• Future of this tool is to wrap it up as an
option into gEXEC.
• Centre National De La Recherche Scientifique http://www.in2p3.fr• SDSC http://www.sdsc.edu• IE&M http://iew3.technion.ac.il/• GMX http://www.gmx.fr• CAS, Chemical Abstracts Service http://www.cas.org• Keldysh Institute of Applied Mathematics (Russia) http://www.kiam1.rssi.ru• LUCIE (Linux Universal Config. & Install Engine) http://matsu-
www.is.titech.ac.jp/~takamiya/lucie/• Mellanox Technologies http://www.mellanox.co.il/• TerraSoft Solutions (PowerPC Linux) http://terraplex.com/tss_about.shtml• Intel http://www.intel.com/• BellSouth Internet Services http://services.bellsouth.net/external/• ArrayNetworks http://www.clickarray.com/• MandrakeSoft http://www.mandrakesoft.com• Technische Universitat Graz http://www.TUGraz.at/• GeoCrawler http://www.geocrawler.com/• Cray http://www.cray.com/• Unlimited Scale http://www.unlimitedscale.com/• UCSF Computer Science http://cs.usfca.edu/• RoadRunner http://www.houston.rr.com• Veritas Geophysical Integrity http://www.veritasdgc.com• Dow http://www.dow.com/• The Max Planck Society for the Advancement of Science http://www.mpg.de• Lockheed Martin http://www.lockheedmartin.com• Duke University http://www.duke.edu• Framestore Computer Film Company http://www.framestore-cfc.com• nVidia http://www.nvidia.com/• SAIC http://www.saic.com• Paralogic http://www.plogic.com/• Singapore Computer Systems Limited http://www.scs.com.sg/• Hughes Network Solutions http://www.hns.com• University of Washington, Computer Science http://www.cs.washington.edu• Experian http://www.experian.com• L'Universite de Geneva http://www.unige.ch• Purdue Physics Department http://www.physics.purdue.edu/• Atos Origin Engineering Services http://www.aoes.nl/• Teraport http://www.teraport.se• Daresbury Laboratory http://www.dl.ac.uk
• Clinica Sierra Vista http://www.clinicasierravista.org• LondonTown http://www.londontown.com/• National Hellenic Research Foundation http://www.eie.gr• RightNow Techologies http://www.rightnow.com/• Idaho National Engineering and Environmental Laboratory http://www.inel.gov• WesternGeco http://www.westerngeco.com• 80/20 Software Tools http://rc.explosive.net• Optiglobe Brazil http://www.optiglobe.com.br• Brunel University http://www.brunel.ac.uk• Cinvestav Instituto Politecnico Nacional http://www.ira.cinvestav.mx• Conexant http://www.hotrail.com• Dell http://www.dell.com/• SuSE Linux http://www.suse.de• Arabic on Linux http://www.planux.com• Delgado Community College, New Orleans http://www.dcc.edu• Boeing http://www.boeing.com• RedHat http://www.redhat.com/• University of Pisa, Italy http://www.df.unipi.it• Ecole Normale Superieure De Lyon http://www.ens-lyon.fr• iMedium http://www.imedium.com• Moving Picture Company http://www.moving-picture.com• Professional Service Super Computers http://www.pssclabs.com• AlgoNomics http://www.algonomics.com• Ocimum Biosolutions http://www.ocimumbio.com• Caltech http://www.caltech.edu• VitalStream http://www.publichost.com• Sandia National Laboratory http://www.sandia.gov/• UC Irvine http://www.uci.edu• Guide Corporation http://www.guidecorp.com/• Matav http://www.matav.hu• Math Tech, Denmark http://www.math-tech.dk• Istituto Trentino Di Cultura http://www.itc.it• Compaq http://www.compaq.com/• National Research Council Canada http://www.nrc.ca• Overture http://www.overture.com• Petroleum Geo-Services http://www.pgs.com• National Research Laboratory of the US Navy http://www.nrl.navy.mil• White Oak Technologies, Inc. http://www.woti.com/
Known Sites Using Ganglia Cluster ToolkitMost popular cluster and distributed computing software on sourceforge.netOver 7000 downloads since release of 1/2002
Grid computing• Working with key cluster software developers from research and
industry to standardize cluster tools within the Global Grid Forum (GGF).
CITRIS Cluster• Goal is to build a production level cluster environment
that supports and is driven by CITRIS applications– NOW mostly experimental– Millennium ½ developmental ½ production
• Clusters adopted as primary compute platform– ~800 current Millennium users– 65% average CPU utilization on Millennium cluster, many
times 100% utilization– 50% of top 20 PACI users compute on Linux clusters for
development and production runs.
Foundry8000
1TFlop 1.6TB memory100 Dual ItaniumCompute Nodes
10 Storage Nodes
50TB Fibre ChannelStorage
Myrinet2000
Foundry8000
Foundry1500
CampusCore
100
10
100
10
101 Gigabit EthernetMyrinet
Fibre Channel
2 Frontend Nodes22
Steve Brenner ProjectLarge Molecular Sequence and
Structure Databases• These databases are in gigabytes• They provide web services in which low latency is
important• They often work remotely• The campus 70Mbit limit is increasingly saturated,
making it impossible to effectively provide services and do the work
• They need tele/video conferencing over IP
Background of the Brain Imaging Center at Berkeley• Campus-wide resource dedicated to Functional
Magnetic Resonance Imaging (FMRI) research• Non-invasive “neuroimaging” technique used to
investigate the blood flow correlates of neural activity
• BIC houses a Varian 4 Tesla scanner and Neuroimaging Computational Facility providing collaboration among neuroscientists, physicists, chemists, statisticians, ee and cs scientists
Currant LAN
• Due to high volume of data, we established high speed connections between computers in buildings around the campus
• LAN consists of two Cisco Catalyst 6500 switches connected with optic fiber and communicate at Gigabit Ethernet speed
• Workstations connected to network at Fast Ethernet speed (100 Mbits/sec, full duplex)
WAN Needs
• Geographically distributed collaborative researchers and immense data sets make high speed networking a priority.
• Collaborations exist between researchers at UCSD, UCSF, UC Davis, Stanford, Varian Inc. and NASA Ames.
• With spiral imaging, we will soon be capable of generating data in excess of 1MB/s per scanner
NASDAQ vs. O'Reilly Tech Book Sales at Amazon January 1, 1999 through September 30, 2001
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Date
Norm
aliz
ed U
nits
Sol
d Va
lue
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized O'Reilly Unit Sales atAmazon
Normalized NASDAQ Index Value
CITRIS Network in Smart Classroom