Network Developments and Network Monitoring in Internet2
Eric BoydDirector of Performance Architectureand TechnologiesInternet2
Overview
• Internet2 Network
• Performance Middleware: Supporting Network-Based Science
• Internet2 Network Observatory
An Asset for the Community
Universities
Researchers
Regional Networks
K-12
Industry
International
Internet2 Network: An Asset for the Community
Universities
Researchers
Regional Networks
K-12
Industry
International
Internet2 Network
• Hybrid optical and IP network
• Dynamic and static wavelength services
• Fiber, equipment dedicated to Internet2; Level 3 maintains network and service level
• Platform supports production services and experimental projects
Internet2 Network - Layer 1Internet2 Network - Layer 1
Internet2 Network Optical Switching Node
Level3 Regen Site
Internet2 Redundant Drop/Add Site
ESnet Drop/Add Site
Internet2 Network DeploymentInternet2 Network Deployment
Internet2 Network Optical Switching Node
Level3 Regen Site
Internet2 Redundant Drop/Add Site
ESnet Drop/Add Site
Internet2 Network DeploymentInternet2 Network Deployment
Internet2 Network Optical Switching Node
Level3 Regen Site
Internet2 Redundant Drop/Add Site
ESnet Drop/Add Site
Internet2 Network DeploymentInternet2 Network Deployment
Internet2 Network Optical Switching Node
Level3 Regen Site
Internet2 Redundant Drop/Add Site
ESnet Drop/Add Site
The New Internet2 Network
A New Wrinkle
• Internet2 exploring a merger with National Lambda Rail (NLR)
• Goal: Consolidate national higher education and research networking organizations
• Technical team is exploring what the merged technical infrastructure will look like
Overview
• Internet2 Network
• Performance Middleware: Supporting Network-Based Science
• Internet2 Network Observatory
Network-Based Science
• Science is a global community• Networks links scientists• Collaborative research occurs across network
boundaries• For the scientist, the value of the network is
the achieved network performance• Scientists should not have to focus on the
network; good end-to-end performance should be a given
Large Hadron Collider
• International Physics facility located in CERN Switzerland
• Major US involvement• 2 major US data repositories (PetaBytes/year)• 17 US Institutions provide data analysis and storage• 68 Universities and National Laboratories with scientists
looking at the data
• Dedicated transatlantic networks connect US to CERN• Advanced network services required over existing
campus, connector/regional, and national networks
Cyberinfrastructure
Security
MiddlewarePerformance
End to End
Policy
Cyberinfrastructure
Achieving Good End-to-End Performance
• Internet2 consists of:• Campuses• Corporations• Regional networks• Internet2 backbone network
• Our members care about connecting with:• Other members• Government labs & networks• International partners
• The Internet2 community cares about making all of this work
Identifying the Problem
Applications Developer
System Administrator
LAN Administrator
CampusNetworking
Gigapop Gigapop
Backbone
CampusNetworking
LAN Administrator
System Administrator
Applications Developer
How do you solvea problem along a path?
Hey, this is not working right!
The computerIs working OK
Talk to the other guys
Everything isAOK
No othercomplaints
The network is lightly loaded
All the lights are green
We don’t see anything wrong
Looks fine
Others are getting in ok
Not our problem
Status Quo
• Performance is excellent across backbone networks
• Performance is a problem end-to-end• Problems are concentrated towards the edge
and in network transitions• We need to:
• Diagnose: Understand limits of performance• Address: Work with members and application
communities to address those performance issues
Vision: Performance Information is …
• Available• People can find it (Discovery)• “Community of trust” allows access across
administrative domain boundaries (AA)
• Ubiquitous• Widely deployed (Paths of interest covered)• Reliable (Consistently configured correctly)
• Valuable• Actionable (Analysis suggests course of action)• Automatable (Applications act on data)
Goal: No more mystery …
• Increase network awareness• Set user expectations accurately
• Reduce diagnostic costs• Performance problems noticed early • Performance problems addressed efficiently• Network engineers can see & act outside their turf
• Transform application design• Incorporate network intuition into application
behavior
Strategy: Build & Empower the Community
Decouple the Problem Space:• Analysis and Visualization• Performance Data Sharing• Performance Data
Generation
Grow the Footprint:• Clean APIs and protocols
between each layer• Widespread deployment of
measurement infrastructure• Widespread deployment of
common performance measurement tools
Analysis & Visualization
Measurement Infrastructure
Performance Tools Performance
Tools
Analysis & Visualization
Measurement Infrastructure
API
API
Tactics: Leverage position
• Internet2 is leveraged to help provide diagnostic information for “US backbone” portion of problem
• Create *some* diagnostic tools (BWCTL, NDT, OWAMP)• Make network data as public as is reasonable• Work on efforts to more widely make performance data
available (perfSONAR)• Contribute to ‘base’ perfSONAR development (partnership with
ESnet, Europe, and Brazil)• Contribute to standards for performance information sharing
(Open Grid Forum Network Measurement Working Group)• Integrate ‘our’ diagnostic tools as ‘good’ example of perfSONAR
services
From the scientist’s perspective
On behalf of the scientist, network engineer or application can easily/automatically:
• Discover additional monitoring resources• Authenticate locally• Authorized to use remote network resources to a
limited extent• Acquire performance monitoring data from remote
sites via standard protocol• Innovate where needed
• Customize the analysis and visualization
Internet2 End-to-End Performance Initiative (E2Epi)
• Includes:• Internet2 staff• Internet2 members• Federal partners• International partners
• Building:• Performance monitoring tools• Performance middleware frameworks• Performance improvement tools
Support for E2Epi
• Funded out of network revenues• Partnerships
• Leveraging GÉANT2, ESnet, and RNP resources through consortium leadership
• Grants• NSF Apps - Targeted Assistance and Instrumentation for Internet2
Applications• NSF SGER - Leveraging Internet2 Facilities for the Network
Research Community• NSF SGER2 - Network Measurement for International Connections• NSF BTG - Bridging the Gap: End-to-End Networking for Landmark
Applications• NLM Pilot - User Experience with the High Performance Internet
Infrastructure: Critical Incidents of Success and Failure• NLM NDT - Enhancing the Web 100-based Network Diagnostic
Tool
Current Activities
• Analysis/Diagnostic tools• Performance tools• Software distributions to enable partner
network organizations to participate• Google Summer of Code• New network deployment of
measurement infrastructure on new observatory
Software Distributions
• NPToolkit (Network Performance Toolkit)• Will include much of the following eventually
• NDT (avail now)• OWAMP (avail now)• BWCTL/Thrulay (May)• AMI (Fall?)
• Regular testing and collection for OWAMP/BWCTL
• perfSONAR-PS (Earliest - Fall)• SNMP collection/archive• AMI archive• TopoS and L2Status
• perfSONAR UI’s (Earliest - Winter)
Google Summer of Code
5 Projects• NDT enhancements• Phoebus protocol enhancements• Chrolog (user-space timestamp)• OWAMP (Java Client)• perfSONAR/cacti interface
OWAMP (3.0c)
• One-way latencies• Full support of RFC 4656• Deployment Status
• Abilene: all remaining nms4 hosts• New network newy and chic (nms-rlat)
• Software available at:http://e2epi.internet2.edu/owamp/
BWCTL (1.2b)
• Throughput Test Controller• Pending Software release
• Additional throughput tools• Iperf/thrulay/nuttcp
• More tolerant of questionable clocks
• Deployment Status• Abilene: open TCP testing• New network - awaiting new software release
What is perfSONAR?
• Performance Middleware• perfSONAR is an international consortium
in which Internet2 and GÉANT2 are founders and leading participants
• perfSONAR is a set of protocol standards for interoperability between measurement and monitoring systems
• perfSONAR is a set of open source web services that can be mixed-and-matched and extended to create a performance monitoring framework
perfSONAR Design Goals
• Standards-based• Modular• Decentralized• Locally controlled• Open Source• Extensible• Applicable to multiple generations of network
monitoring systems• Grows “beyond our control”• Customized for individual science disciplines
perfSONAR Integrates
• Network measurement tools
• Network measurement archives
• Discovery
• Authentication and authorization
• Data manipulation
• Resource protection
• Topology
perfSONAR Credits• perfSONAR is a joint effort:
• ESnet• GÉANT2 JRA1• Internet2• RNP
• ESnet includes:• ESnet/LBL staff• Fermilab
• Internet2 includes:• University of Delaware• Georgia Tech• SLAC• Internet2 staff
• GÉANT2 JRA1 includes:• Arnes• Belnet• Carnet• Cesnet• CYNet• DANTE• DFN• FCCN• GRNet• GARR• ISTF• PSNC• Nordunet (Uninett)• Renater• RedIRIS• Surfnet• SWITCH
perfSONAR Adoption
• R&E Networks• Internet2• ESnet• GÉANT2• European NRENs• RNP
• Application Communities• LHC• Roll-out to other
application communities in late 2007
• Distributed Development• Individual projects (10
before first release) write components that integrate into the overall framework
• Individual communities (5 before first release) write their own analysis and visualization software
perfSONAR-PS*
• perfSONAR (Perl Services)
Why?• Adoption of Java Services difficult
• Many network administrators don’t do Java, but are fluent in Perl)
• Services more directly targeted at the data available from Internet2 observatory deployment.
perfSONAR Deployment Status
• Demo …
Overview
• Internet2 Network
• Performance Middleware: Supporting Network-Based Science
• Internet2 Network Observatory
History and Motivation
•Original Abilene racks included measurement devices
• Included a single (somewhat large) PC• Early OWAMP, Surveyor measurements• Optical splitters at some locations
•Motivation was primarily operations, monitoring, and management - understanding the network and how well it operates•Data was collected and maintained whenever possible
• Primarily a NOC function• Available to other network operators to understand the network• It became apparent that the datasets were valuable as a network
research tool
Rick Summerhill
The Abilene Upgrade Network
Upgrade of the Abilene Observatory
•An important decision was made during the Abilene upgrade process (Juniper T-640 routers and OC-192c)
• Two racks, one of which was dedicated to measurement• Potential for research community to collocate equipment
•Two components to the Observatory• Collocation - network research groups are able to collocate
equipment in the Abilene router nodes• Measurement - data is collected by the NOC, the Ohio ITEC,
and Internet2, and made available to the research community
An Abilene router node
Power
Out-of-band
Eth. Switch
T-640
(M-5)
Power (48VDC)
Measurement Machines
(nms)
Space for Collocation!
Measurement(Observatory)
Rack
Dedicated servers at each node
• Houston Router Node - In this picture:• Measurement
machines• Collocated
PlanetLab machines
Example Research Projects• Collocation projects
• PlanetLab – Nodes installed in all Abilene Router Nodes. See http://www.planet-lab.org
• The Passive Measurement and Analysis Project (PMA) - The Router clamp. See http://pma.nlanr.net
• Projects using collected datasets. See http://abilene.internet2.edu/observatory/research-projects.html• “Modular Strategies for Internetwork Monitoring”• “Algorithms for Network Capacity Planning and Optimal
Routing Based on Time-Varying Traffic Matrices” • “Spatio-Temporal Network Analysis”• “Assessing the Presence and Incidence of Alpha Flows in
Backbone Networks”
The New Internet2 Network
• Expanded Layer 1, 2 and 3 Facilities• Includes SONET and Wave equipment• Includes Ethernet Services• Greater IP Services
• Requires a new type of Observatory
The New Internet2 Network
The New Internet2 Observatory• Seek Input from the Community, both Engineers
and Network Researchers• Current thinking is to support three types of
services• Measurement (as before)• Collocation (as before)• Experimental Servers to support specific projects - for
example, Phoebus (this is new)
• Support different types of nodes:• Optical Nodes• Router Nodes
The New York Node - First Installment
Existing Observatory Capabilities• One way latency, jitter, loss
• IPv4 and IPv6 (“owamp”)• Regular TCP/UDP throughput tests – ~1 Gbps
• IPv4 and IPv6; On-demand available (“bwctl”)• SNMP
• Octets, packets, errors; collected 1/min• Flow data
• Addresses anonymized by 0-ing the low order 11 bits• Routing updates
• Both IGP and BGP - Measurement device participates in both• Router configuration
• Visible Backbone – Collect 1/hr from all routers• Dynamic updates
• Syslog; also alarm generation (~nagios); polling via router proxy
Observatory FunctionsDevice Function Details
nms-rthr1 Measurement BWCTL on-demand 1 Gpbs router throughput, Thrulay
nms-rthr2 Measurement BWCTL on-demand 10 Gbps router throughput, Thrulay
nms-rexp Experimental NDT/NPAD
nms-rpsv Measurement Netflow collector
nms-rlat Measurement OWAMP with locally attached GPS timing
nms-rpho Experimental Phoebus 2 x 10GE to Multiservice Switch
nms-octr Management Controls Multiservice Switch
nms-oexp Experimental NetFPGA
nms-othr Measurement On-demand Multiservice Switch 10 Gbps throughput
Router Nodes
Router Nodes
Optical Nodes
Optical Nodes
Observatory Hardware
• Dell 1950 and Dell 2950 servers• Dual Core 3.0 GHz Xeon processors• 2 GB memory• Dual RAID 146 GB disk• Integrated 1 GE copper interfaces• 10 GE interfaces
• Hewlett-Packard 10GE switches• 9 servers at router sites, 3 at optical only sites
Observatory Databases – Datа Types
•Data is collected locally and stored in distributed databases•Databases
• Usage Data• Netflow Data • Routing Data • Latency Data • Throughput Data • Router Data • Syslog Data
Uses and Futures
• Some uses of existing datasets and tools• Quality Control• Network Diagnosis• Network Characterization• Network Research
• Consultation with researchers• Open questions
Observatory Deployment (July)
• NDT/NPAD servers• OWAMP/BWCTL deployments• perfSONAR services (perfSONAR-PS)
• LS (Discovery)• SNMP collection/archive• OWAMP/BWCTL archive• TopoS and L2 status (GN2 E2EMon
compatible)
Recall: Datasets
• Usage Data
• Netflow Data
• Routing Data
• Latency Data
• Throughput Data
• Router Data
• Syslog Data
• ND, NR
• ND, NC, NR
• NR
• QC, ND, NR
• QC, ND, NR
• ND, NR
• NRAnd, of course, most used for operations
Quality Control: e-VLBI
• When starting to connect telescopes, needed to verify inter-site paths
• Set up throughput testing among sites (using same Observatory tool: bwctl)• Kashima, JP• Onsala, SE• Boston, MA (Haystack)
• Collect and graph data; distribute via web• Quick QC check before applications tests
start
Quality Control: e-VLBI Network
Quality Control: eVLBI Result
• Automated monitoring allowed view of network throughput variation over time• Highlights route changes, network outages
• Automated monitoring also helps to highlight any throughput issues at end points:• e.g. Network Interface Card failures, untuned TCP
Stacks
• Integrated monitoring provides overall view of network behavior at a glance
Network Diagnosis: e-VLBI
• Target at the time: 50Mbps• Oops: Onsala-Boston: 1Mbps• Divide and Conquer• Verify Abilene backbone tests look good• Use Abilene test point in Washington DC• Eliminated European and trans-Atlantic
pieces• Focus on problem: found oversubscribed link
Quality Control: IP Backbone
• Machines with 1GE interfaces, 9000 MTU• Full mesh• IPv4 and IPv6• Expect > 950 Mbps TCP• Keep list of “Worst 10”• If any path < 900 Mbps for two successive
testing intervals, throw alarm
Quality Control: Peerings
• Internet2 and ESnet have been watching the latency across peering points for a while.
• Internet2 and DREN have been preparing to do some throughput and latency testing
• During the course of this set up, found interesting routing and MTU size issues
Network Diagnosis: End Hosts
• NDT, NPAD servers• Quick check from a host that has a
browser• Easily eliminate (or confirm) last mile
problems (buffer sizing, duplex mismatch, …)
• NPAD can find switch limitations, provided the server is close enough
Network Diagnosis: Generic
• Generally looking for configuration & loss• Don’t forget security appliances
• Is there connectivity & reasonable latency? (ping -> OWAMP)
• Is routing reasonable (traceroute, proxy)• Is host reasonable (NDT; NPAD)• Is path reasonable (BWCTL)
Network Characterization
• Flow data collected with flow-tools package
• All data not used for security alerts and analysis [REN-ISAC] is anonymized
• Reports from anonymized data available (see truncated addresses)
• Additionally, some Engineering reports
Network Research Projects
• Major consumption• Flows• Routes• Configuration
• Nick Feamster (while at MIT)• Dave Maltz (while at CMU)
• Papers in SIGCOMM, INFOCOM• Hard to track folks that just pull data off of web
sites
Lots of Work to be Done
• Internet2 Observatory realization inside racks set for initial deployment, including new research projects (NetFPGA, Phoebus)
• Software and links easily changed• Could add or change hardware depending on
costs• Researcher tools, new datasets• Consensus on passive data
Not Just Research
• Operations and Characterization of new services• Finding problems with stitched together VLANs• Collecting and exporting data from Dynamic
Circuit Service...• Ciena performance counters• Control plane setup information• Circuit usage (not utilization, although that is also nice)
• Similar for underlying Infinera equipment
• And consider inter-domain issues
Sharing Observatory Data
We want to make Internet2 Network Observatory Data:
• Available:• Access to existing active and passive
measurement data• Ability to run new active measurement tests
• Interoperable:• Common schema and semantics, shared across
other networks• Single format• XML-based discovery of what’s available
Internet2 Deployment Status
• Focus is on development of services for Internet2 new network and integration with Indiana NOC
• Submitting a proposal to NSF for additional funding• Target: July 2007 as new Internet2 network goes
operation• OWAMP MA• BWCTL MA/MP• IU-based Topology Service• Multi-LS• NOC Alarm Transformation Service
More Information
• Eric Boyd• [email protected]• 734-352-7032
• http://e2epi.internet2.edu/• http://bwctl.internet2.edu• http://ndt.internet2.edu/• http://owamp.internet2.edu/• http://vfer.internet2.edu/• http://www.perfsonar.net/• http://nwmg.internet2.edu