Upload
cecily-may
View
217
Download
1
Embed Size (px)
Citation preview
Physical Buildout of the OptIPuter at UCSD
What Speeds and Feeds Have Been Deployed Over the Last 10 Years
Scientific American, January 2001
Number of Years0 2 4 6 8 10
Pe
rfo
rma
nc
e p
er
Do
llar
Sp
en
t
Uplink Speed
DWDM Capability
Endpoint Speed
10Mb1000Mb
Doublings
16 - 32 x
10000Mb
10000Mb
7
10
13
WiglafRockstar
OptIPuter Infrastructure
½ Mile
SIO
SDSC
CRCA
Phys. Sci -Keck
SOM
JSOE Preuss
6th College
SDSCAnnex
Node M
Earth Sciences
SDSC
Medicine
Engineering High School
To CENIC and NLR
Collocation
Source: Phil Papadopoulos, SDSC; Greg Hidley, Cal-(IT)2
The UCSD OptIPuter DeploymentUCSD is Prototyping
a Campus-Scale OptIPuter
Calit2
JuniperT320
0.320 TbpsBackplaneBandwidth
ChiaroEstara
Dedicated Fibers Between Sites Link
Linux Clusters
Cisco 6509 8 – 10GigE
UCSD Packet Test BedOptIPuter Year 2
110
ChiaroEnstara
SDSC JSOE
CSE
10
1
8-nodecluster
(shared)
IBM 9-node viz cluster
SIO
SOM
IBM 48-node storage cluster
IBM 128-node compute cluster
Sun 128-node computecluster
Sun 17-nodestoragecluster
CRCA
6th College
3-node viz cluster
1
IBM 9 mpixeldisplay pairs
Geowall 2Tiled Display
10
Sun 22-node
viz cluster
10Extreme 400Extreme 400
To UCI, ISI and StarLight via CalREN-XD and NLR
DellGeowall
Preuss
IBM 9 mpixeldisplay pairs
Dell Viz
Dell 5224
Dell 6024F
Dell 5224
Dell 5224
Dell 5224
Extreme 400
HP28-nodecluster
(shared)
HP4-nodecontrol
Sun 17-node computecluster
Infiniband4 nodes
Infiniband64 nodes
Sun 5-node
viz cluster
Sun17-nodecomputecluster
Fujitsu
7-node cluster(shared) 10
To StarLight via NLR
4
9-node cluster
(shared) UCSD & CalREN-HPR Shared
IP Network
6
Geowall 2Tiled Display
1
Different Kind of Experimental Infrastructure
• UCSD Campus Infrastructure– A campus-wide experimental apparatus
• Different Kinds of Cluster Endpoints (scaling in the usual dimensions)– Compute– Storage– Visualization– 300 + Nodes available for experimentation (ia32, Opteron, Linux)– 7 different labs
• Clusters and Network can be allocated and configured by the researcher at the lowest level– Machine SW configuration: OS (kernel, networking modules, etc),
Middleware, OptIPuter System Software, Application Software– Root access given to researchers when needed – As close to chaos as we can get
• Networks– Packet oriented network. 10 Gbps/site. Multiple 10GigE where needed– Adding lambda capability (Quartzite: Research Instrumentation Award)
What’s Coming Soon?
• 10 GigE Switching– Force 10 e1200. Initially with sixteen 10GigE Connections
– Expansion is $6K/Port + Optics ($2K for Grey, $5K for DWDM)– Line Cards, Grey Optics here. Awaiting Chassis
– Force 10 S50 Edge Switches– 48-port GigE + two 10GigE uplinks ~ $10K with Grey Optics
• 10 GigE NICs– Neterion
– PCI-X (Intel OEM) with XFP (just received)
– Myrinet 10G (PCI Express)– Ready to place Order
• DWDM – On Order: four 10GigE XFPs, 40KM, Channels 31,32 (2 each).
– Delayed: Expect arrival in March (Sigh).– Following NASA’s lead on the DWDM Hardware (Very good Results on Dragon)
– Arrived: two 8 channel Mux/DeMux from Finisar
• DWDM Switching– Expect Wavelength selective switch this summer.
What’s Changing II
• “Center Switching Complex” moving to Calit2• Should be done my end of March
• A modest number of endpoint for OptIPuter Research will be added• A larger Number (e.g. CAMERA) of “production” resources added
• Increasing emphasis on longer haul connections– Connections to UCI
Quartzite: Reconfigurable Networking
• NSF Research Instrumentation, Papadopoulos, PI• Packet network is great
– Give me bigger and faster of what I already know– Even though TCP is challenged on big pipes
– What about lambdas? And switching lambdas?
• Existing Fiber Plant is fixed.– Want to Experiment with different topologies? -> “buy” a telecom
worker to reconnect cables as needed
• Quartzite: Research Instrumentation Award (Started 15 Sep)– Hybrid Network “Switch stack” at our Collocation Point
– Packet Switch– Transparent Optical Switch
– Allows us to physically build new topologies without physical rewiring
– Wavelength-Selective Switch– Experimental device from Lucent
Quartzite: DWDM
www.aurora.com
www.optoway.com
www.fibredyne.com
$5K/XFP
$2K/Channel (Mux/demux)
$10K/ switch
+
= $14K/Connected PairSingle fiber pair
• Cheap uncooled lasers
• 0W Optical splitters/combiners
• 0.8nm spacing for DWDM
•1GigE, 10GigE
Bonded or Separate
UCSD Quartzite Core at Completion (Year 5 of OptIPuter)
QuartziteCore
CalREN-HPRResearch
Cloud
Campus ResearchCloud
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE
10GigE
...Toothernodes
Quartzite CommunicationsCore Year 3
ProductionOOO
Switch
Juniper T320
4 GigE4 pair fiber
Wavelength Selective
Switch
To 10GigE clusternode interfaces
..... To 10GigE clusternode interfaces and
other switches
Chiaro Enstara
32 10GigE
• Funded 15 Sep 2004
• Physical HW to Enable Optiputer and Other Campus Networking Research
• Hybrid Network Instrument
Reconfigurable Network and
Enpoints
Scalable and automated network mapping for
Optiputer/Quartzite Network
Optiputer AHM Meeting
San Diego, CA
January 17 2006
Praveen Jagadishprasad Hassan Elmadi
Calit2, UCSD
Phil Papadopoulos Mason Katz
SDSC
Network Map ( 01/16/2006)
Motivation
• Management– Inventory– Troubleshooting
• Programming the network– Ability to view and manipulate the network as a single
entity.– Aid network reconfiguration in a heterogenous network
– Experimental networks have high degree of reconfiguration
• Glimmerglass based physical changes
• VLAN based logical topology changes– Final goal to automate the reconfiguration process.
• Focus on switch/router configuration process
Automated Discovery
• Minimal input needed. – One gateway might be sufficient
• SNMP based discovery– Not tied to vendor protocol
– Tested with Cisco, HP, Dell, Extreme etc
– Almost all major vendors support SNMP
• Fast – Discovery process highly threaded– 3 minutes for UCSD optiputer network (~600 hosts and 20 switches)
• Framework based– Extensible to include mibs for specific switch/router models. For example
– Cisco vlans
– Extreme trunking
Design for discovery and mapping
• Phase 1 ( Layer 3 )– Router discovery– Subnet discovery
• Phase 2 ( Layer 2)– Switch discovery– Host discovery– Switch <---> Host mapping– IP arp mapping
• Phase 3– Network mapping
– Form integrated map through novel algorithms
– Area of research
• Phase 4– Web based Viz– Database storage
Future work
• Reliable discovery of logical topology ( VLANs)• Automate generation of switch/router configs
– Use physical topology information to aid config generation– Fixed templates for each switch/router model
– Templates are extended depending on configuration needed
• Batch configuration of switches/routers– Support Custom VLANS with only end-host specification– Constructing spanning tree of end-host and intermediate switches/routers\– Schedule dependencies for step-by-step configuration– Physical topology information essential
• Logical topology adds an VLAN table to the physical topology tables.– VLAN composed of trunks. – Each Trunk can be a single/multiple port to port connection between same set
of switches– Schema supports retaining VLAN id when modifying trunks and vice-versa.
Optiputer Network Inventory Management – Logical View
LOGICAL TOPOLOGY (Single VLAN) GRAPH
Look at Parallel Data Serving• 128 node Rockstar Cluster (Same as SC2003 Build)• 1 SCSI Drive/File Server Node
8 LustreClients
10 LustreFile Servers
10 LustreFile Servers
8 LustreClients
10 LustreFile Servers
10 LustreFile Servers
8 LustreClients
10 LustreFile Servers
10 LustreFile Servers
8 LustreClients
10 LustreFile Servers
10 LustreFile Servers
48 Port GigE + 10GigE Uplink
48-port GigE 48-port GigE 48-port GigE 48-port GigE
8888
Basic Performance
• 32, 8, 16, 4 nodes reading the same 32 GB file
• Under these Ideal Circumstances, able to read more than 1.4GB/sec from disk
• Writing different 10 GB files from each nodes: about 700MB/s
Why a Hybrid Structure
• Create different physical topologies quickly• Change when site/node is connected via packet, lambda or a hybrid
combination– Want to understand the practical challenges in different circumstances
• Circuits don’t scale in the Internet Sense• Packet switches will be congested in for long-haul
– Real QoS is unreachable in the ossified Internet
• The engineering compromise is likely a hybrid network– Packet paths always exist (internet scalability argument)– Circuit paths on demand
– Think private high-speed networks not just point-to-point
Summary
• OptIPuter is addressing a subset of the research needed for figuring out how to waste (I mean utilize) bandwidth
• Work at multiple levels of the Software stack – protocols, virtual machine construction, storage retrieval
• Trying to understand how lambdas are presented to applications – Explicit?– Hidden?– Hybrid?
• Building an experimental infrastructure as large as our budget will allow– OptIPuter is already international in scale at 10gigabit.– Approximating the Terabit Campus with Quartzite