98
Internet2 R&D Update Eric Boyd Deputy Technology Officer 28 April 2009 Internet2 Spring Member Meeting

Internet2 R&D Update · • Moving towards roll-out of DCN pilot (focus on operations) by July, 2009 • Open Issues ... • SNMP on MAX, Internet2 Backbone (via perfSONAR of

Embed Size (px)

Citation preview

Internet2 R&D Update

Eric Boyd Deputy Technology Officer 28 April 2009 Internet2 Spring Member Meeting

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

Strategic Plan Implementation

•  Strategic Plan (Summer 2008) •  Strategic Plan Task White Papers (April

2009) •  Each written independently •  Discuss current effort, environment, current

tensions, future opportunities, metrics for success

•  Strategic Plan Prioritization (Next Step)

R&D Strategic Plan Focus

•  Cyberinfrastructure (all councils) •  Environment for Research (RAC) •  Middleware (AMSAC) •  Security (AMSAC) •  Architecture and Operations (AOAC) We welcome and encourage community

suggestions and revisions

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

Cyberinfrastructure and the Internet2 Community

•  Operating advance services by and for the community •  e.g. Networks, Observatories, Federations

•  Experimenting with developmental services •  e.g. Dynamic Circuits, Distributed Monitoring, Hybrid

Networking

•  Adopting new technologies •  e.g. Workshops, Targeted Communities

•  Partnering with like-minded organizations

Integrated Systems Approach

•  What does “Integrated” mean? •  Interoperable • Widely Deployed • Community Best Practices • Extensible

•  Observation: Building distributed systems that operate as a larger distributed system

Distributed System Design Goals

•  Take existing scientific applications, without recompilation or awareness of circuits, e.g. •  Bulk File Transfer •  Real Time •  Video

•  Exploit performance possibilities of new networking technologies

•  Preserve “current politics of business,” (donʼt upset the apple cart)

•  Improve efficiency of problem diagnosis (eliminate reliance on “old boy network”)

Distributed System Requirements •  These distributed systems share common

requirements: •  Heterogeneous network architecture •  Multiple administrative entities; no central authority •  Local customization of operational environment •  Applications driven by orthogonal virtual organizations

•  Suggests parallel design approach •  Toolkit approach •  Web services / defined APIs

Distributed Systems for Networks

•  To build next generation networks, we need distributed software systems on top of the network hardware •  Session-Application (Session-Layer tools [e.g. Phoebus], Community-

specific abstraction applications [e.g. Lambda Station, Terapaths], true applications)

•  Dynamic Circuit Networks (DCN, e.g. Internet2 DCN, ESnet SDN, GÉANT2 Autobahn)

•  Performance Measurement Framework (e.g. perfSONAR) •  Information Services (IS)

•  Discovery •  Topology

•  Authentication, Authorization, and Accounting (AAA, e.g. Shibboleth, etc.)

Multi-Layer Distributed System

• Design is “parallel” for each system • Hierarchical dependency relationship • Suggests abstracting common components, publication/polling architecture across boundaries

Multi-Layer Distributed System

Session-Layer Abstraction Control Plane Framework

Performance Monitoring

Information Services Federated Trust

Layer 3

Layer 2

Layer 1

Har

dwar

e So

ftwar

e/ Se

rver

s

• Design is “parallel” for each system • Hierarchical dependency relationship • Suggests abstracting common components, publication/polling architecture across boundaries • Creates a common network abstraction toolkit to present to application

Scientific and Collaborative

Application

Diagnostic Analysis and Visualization

Tools

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

Unified Network Information Services

•  Network cyberinfrastructure services must be discoverable •  In the DCN world – “Who can I talk to?” •  In the perfSONAR world – “What network

measurements can I find or make?” •  Both DCN and perfSONAR use

common technology for this functionality

Unified Network Information Services

•  Internet2 Information Services Working Group (IS-WG) •  Working to unify and extend these services

•  This effort will result in a common network cyberinfrastructure discovery and information service •  The “nameservers” of dynamic, visible

networks

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

DCN Summary

•  Provides short-term dedicated bandwidth •  Similar and complementary to IP networking:

•  Protocol-based connections

•  Connect to anyone else on the network

•  Supports high-bandwidth and real-time applications •  Being developed and deployed by a number of R&E

networks •  More flexible (and potentially more cost-effective) than

long-term dedicated circuits

Example DCN Applications

•  LHC CMS (UNL -> FNAL) •  LHC Atlas (BU, UMich-> BNL) •  LIGO (Syracuse -> U Wisc Milwaukee

-> Caltech) •  Ultragrid (LSU -> Masaryk) •  Weather Simulation (Northrop

Grumman/Utah -> NG/MAX)

Potential DCN Applications

•  Distributed backup service •  Automated data distribution service

(“warm” local disk repositories) •  Telepresence •  etc.

What is DCN?

•  Dynamic Circuit Network •  Generic term •  Internet2 DCN is an example

•  Physical network •  Proto-duction Service

•  DCN Software Suite (DCN SS) •  One implementation of a DCN

•  Combination of DRAGON and OSCARS •  Jointly developed with ESnet, MAX, USC ISI East

•  Interoperable with other implementations speaking the InterDomain Control Protocol (IDCP)

What is IDCP?

•  InterDomain Control Protocol •  Developed jointly with ESnet and GÉANT2 •  Enables creation of point-to-point circuits between users

across multiple intermediate networks •  Parallel to telephone network

•  How it works •  End-user or application sends connection request to control

plane •  Control plane software automates authorization,

reservation, set up and tear down of circuits •  Working with GLIF and OGF; expect IDCP to evolve as

GLIF and OGF community develops a common protocol

What is DCN? DCN Software Suite

•  DCN SS is matched pair of open software services

•  OSCARS (IDC) •  Open source project maintained by Internet2 and

ESNet •  DRAGON (DC)

•  NSF-funded •  Open source project maintained by Internet2, MAX,

USC ISI EAST, and George Mason University •  Version 0.5 is deployed

DCN Software Suite

•  Policy Service (May) •  Define classes of users •  Options to limit provisioning based on parameters

such as endpoints •  Usage Statistics (May)

•  Enhanced reporting of circuit activity for Internet2 backbone

•  DCN Software Suite 0.6 (October) •  Modularization of functional components

Expansion / Modularization of OSCARS Architecture

Institution A

Institution B

Regional IP Network Internet2 IP Network

Peer IP Network Regional DC Network

Internet2 DC Network Peer DC Network

Host

Router

Host

Router

Shared IP Transport

Layer 2

Layer 2 Switch

Switch

Layer 3

Layer 3 Dynamic Circuit

Network

Why Dynamic Circuits?

Dynamic Static

Can select different end-points, bandwidths, and durations

End point parameters can’t be changed

Resources are only used when needed

Resources are allocated even when not in use

Efficient – unused resources can be used by others

Inefficient – unused resources are unavailable

No charge when not in use Charged even when not in use

Green! Not green

Collaboration is inherently dynamic.

Internet2 DCN Demo

Photo by Steven S. Wallace

DCN and SCinet * 10GE to USLHCnet/ CalTech -Tier2 on floor to CalTech, UNL, Fermi, CERN * 10GE to Northrup Grumman -weather simulation * 10GE to Pionier - eVLBI * 10GE to Internet2 - LONI - Ultragrid - LEARN - Texas A&M * 1GE to Dutch booth - Phosphorous/IDC demo * 1GE to JGN2

Internet2 DCN Footprint

Feb 2009

Use of the DCN Software Suite Connectors Running IDC Using DCN SS CENIC No No CIC OmniPoP No No GPN Planned Planned LEARN Yes Yes LONI Yes Yes MAX Yes Yes Merit Planned Planned NOX No No NYSERNet Yes Yes PNWGP No No

Use of the DCN Software Suite (cont.)

Networks Running IDC Using DCN SS ESnet Yes Yes AutoBAHN/GEANT Yes No NetherLight Planned No JGN Yes Yes USLHCnet Yes Yes Local/ Campus Running IDC Using DCN SS Northrop Grumman Yes Yes University of Amsterdam Yes Yes

CalTech Yes Yes University of Houston Planned Planned

Where can you learn more?

•  Internet2 DCN Working Group •  https://spaces.internet2.edu/display/DCN/Home

•  DCN Software Suite •  https://wiki.internet2.edu/confluence/display/DCNSS/Home

•  Java Client API •  https://wiki.internet2.edu/confluence/display/DCNSS/Java+Clien

t+APITest IDC Guide •  https://wiki.internet2.edu/confluence/display/DCNSS/Internet2%27

s+Test+IDC •  Obtaining a Test Certificate

•  https://wiki.internet2.edu/confluence/display/CPD/How+to+Request+an+IDC+User+Certificate

Moving toward production….

•  Internet2 has been tasked, by its governance, to migrate the DCN from R&D towards a production service.

•  As a first step in that process, a staff team has begun to work with the community to define a loose set of goals for a Pilot Service to be instituted in July 2009.

•  DCN Pilot is intended to be a lightweight implementation that will allow the community to gain experience before defining the final DCN service offering at a later date.

DCN Pilot Service Status Update

•  Discussions underway for 8 months with AOAC, RAC, NTAC, and DCN WG about a proposed DCN Pilot Service

•  Decision made to delay fee recovery until July, 2010 •  Moving towards roll-out of DCN pilot (focus on operations) by July,

2009 •  Open Issues

•  DCN Fee Recovery Model •  How to make this an end-to-end community service, not just a backbone service •  How to build demand and usage while rolling it out •  Best practices guide for campus roll-out •  How does this fit in the context of network cyberinfrastructure and true hybrid

networking? •  Which domain research communities should we work with?

•  Request connectors to try it over the next few months

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

perfSONAR Motivation

•  Performance problems become actionable when performance characteristics can be seen by all interested parties

•  Multi-domain paths are the most difficult to diagnose

perfSONAR

•  perfSONAR is a multi-institution project to define a suite of services and tools to support multi-domain network measurements and monitoring activities •  Tools in the perfSONAR-PS implementation:

•  BWCTL/OWAMP/pS-B/PingER/SNMP/NDT/NPAD •  Multiple independent implementations •  Built on standards being defined in the

OGF

perfSONAR Roadmap

•  Service Software Releases (May/June) •  perfSONAR-PS v3.1 •  Service Discovery •  Topology Sharing •  Interface and Circuit status •  Active meshes of active latency and throughput

•  OWAMP/BWCTL •  Alarms on exceptional results

•  pS-NPToolkit v3.1 •  Easily deployed single perfSONAR Performance Node (ISO) (Exists as v2) •  Bug fixes/active latency grids/enhanced admin/alarms (v3.1 June/July)

•  perfSONAR Software Distribution Service (June/July) •  Performance Analysis GUIs (Fall 09) •  perfSONAR Performance Grids

•  Manage a set of perfSONAR Performance Nodes

Demo Monitoring

•  Spring Member Meeting Demo •  Cisco Telepresence

•  Endpoints •  Harvard •  Crystal Gateway Hotel

•  Goals •  Measure delay/jitter/loss between these points •  Be able to fix any issues that come up

Regular Monitoring for Latency Sensitive Applications

•  Cisco Telepresence Limits •  10 ms jitter •  160 ms delay •  0.05% loss

•  Polycom Limits •  30-35 ms jitter •  300 ms delay •  <1% loss

Demo Monitoring – Enabling Debugging

•  Path Decomposition •  Deploy more hosts and run regular latency tests on

smaller segments of the path between end hosts •  Shows where on the path to look for the problemʼs

cause •  Path Measurements

•  Obtain utilization statistics from routers along end-to-end path.

•  Allow drilling down to better understand why problems are occurring

Demo Deployment

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

Analysis Software

•  Software was written or modified to make it easy to view and understand the data. •  Provides a variety of views •  Status of the entire network •  Status of a given host •  Status of a given path •  Alerting mechanism when problems are seen

Network Health

•  A grid view of the network describing the latency, jitter and loss between all hosts

Path Status

•  Shows graphs of jitter and loss between hosts along with interface utilization for the path.

Nagios Alarms

•  Alerts administrators when problems are seen •  Easy integration into NOC reporting systems.

Network Performance Analysis

•  Several Potential Issues Identified •  Highly utilized link in the path •  Cross traffic •  Test machine capability/quality •  Other software running on the hosts •  NTP Drift

•  All were solved and verified through diagnostics and monitoring

Highly Utilized Link

•  Initial observation: High jitter values observed between Hotel and Harvard.

•  Process: Isolate where the Jitter is happening.

High Jitter Seen

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

Highly Utilized Link – Path Decomposition

•  Hotel to Northern Crossroads:

Jitter still on shorter path

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

•  Hotel to Internet2 (New York):

Highly Utilized Link – Path Decomposition

Jitter still on shorter path

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

•  Hotel to Internet2 (Washington DC):

Highly Utilized Link – Path Decomposition

Jitter still on shorter path

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

•  Hotel to Mid-Atlantic Crossroads:

Highly Utilized Link – Path Decomposition

Clean between Hotel/MAX Jitter issue seems to be between MAX/Internet2(WASH)

Harvard

Northern Crossroads

Hotel

Mid-Atlantic Crossroads

Internet2 POP

Internet2 POP

Highly Utilized Link

•  What we know via OWAMP •  Jitter is not between Hotel and MAX •  Jitter is somewhere between MAX and New York

•  Next Steps •  Drill down using alternate data sources •  What do we have access to?

•  SNMP on MAX, Internet2 Backbone (via perfSONAR of course!)

•  Can inquire about NOX/Harvard if necessary

Highly Utilized Link

•  Examine each leg of the path: •  Hotel to College Park •  College Park Core •  College Park to Level3 (McClean, VA) •  Internet2 Uplink

•  Identify points of congestion •  1g Uplink from Hotel to College Park •  10g Max Core •  2.5g Internet2 Uplink

Highly Utilized Link

•  Observed on Internet2 Backbone:

Highly Utilized Link - Results

•  Potential Solutions •  Identify flows, re-engineer traffic •  Re-plumb the demo path •  Increase Capacity

•  Results •  Increase MAX Headroom to 10G

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

General Support: Internet2 Observatory

•  Data Collections •  Routing, Netflow, Latency, Bandwidth,

Utilization, Router, Logging •  Interactive via proxy; also trouble tickets •  More being exported via perfSONAR •  Starting to expand to L2/DCN

•  Collocation for Network Research Projects •  PlanetLab / VINI Nodes at Router Nodes •  100x100 Nodes (NetFPGA) Nodes

Making Support Less Ad-Hoc

•  The Research Advisory Council has commissioned a Network Research Review Committee •  Examine policies on passive data collection,

potentially recommend changes •  Look at Internet2 network research priorities •  Review ad-hoc requests that require resources

•  http://www.internet2.edu/networkresearch/nrrc.html

Specific Support Updates

•  NSF-funded 100x100 project •  GENI

100x100 Project

•  Original clean slate project; get 100Mbps to 100M homes

•  In last year deployed NetFPGA PCs for Nick McKeown of Stanford. •  NEWY, HOUS, LOSA, (WASH)

NetFPGA possibilities

•  Programmable routers: demonstrated reduced need for buffers in highly aggregated paths at SIGCOMM 08

•  Switches: demonstrated OpenFlow at GEC3 and SC|08

GENI: Internet2 Backbone Contribution •  A Dedicated 10 Gbps Wave on the Internet2

DWDM system •  Participated with EMUlab / University of Utah in first (Spiral

1) solicitation •  Potential use by other projects whenever possible

•  Have an MoU with the GPO for use of the Wave •  GPO controls how wave is used

•  Access is through regional and campus networks, except for collocated equipment •  IP Network •  Dynamic Circuit Network (DCN) •  Direct Connectivity for Control Framework

Current GPO projects

•  ProtoGENI [ProtoGENI] R. Ricci/Utah •  Internet-Scale Overlay Hosting

[PlanetLab] J. Turner/WUSTL

ProtoGENI

•  Take Utahʼs Emulab software (computers + network in a room) and GENI-ize it. Work to do that funded by NSF.

•  Deploy this nationwide, as a substrate example for GENI. Include control of switches to slice network.

Internet-Scale Overlay Hosting

•  Similar to ProtoGENI, create a programmable node out of off-the-shelf parts. Use parts that would make for a better network element – focus on routing/switching not on complete experiments.

•  Part of the PlanetLab framework (as opposed to ProtoGENI, so exercise different control plane)

Target Internet 2 Deployment

2 3

2

2

2

3 3

Year 1 (near term) deployment

•  Looking to deploy 3 nodes. •  KANS, SALT, {WASH} •  First two due by 1-July. •  Two projects have agreed to collocate

and share equipment.

SPP

Potential Future Projects

•  At the last GENI Engineering Conference, emphasis on “Layer 2 connectivity” to enable non-IP projects •  Specific need unclear; would require more

equipment to multiplex onto 10GE; DCN possible

•  Unknown: results of Spiral 2 solicitation (perhaps by end June?)

Parting Thought

•  Utilize some of these testbeds to try out new network architecture ideas?

•  Partner with some research groups? •  For example, perhaps use 100x100

NetFPGA configured as OpenFlow switches (or GENI) to look at hybrid or different IP network strategies.

Overview

•  Strategic Planning •  Network Cyberinfrastructure •  Information Services •  DCN •  perfSONAR

•  Support for Network Research •  Next Generation Networking

What should the next Internet2 Network Look Like?

•  Beginning to be considered by Councils and WGs

•  Questions around how to fund it •  Sufficient depreciation? •  How to fund new, advanced services?

•  Evolution (series of small decisions) vs. a brand new network

Role of R&D in Network Design

•  We have a range of options available •  Some are well understood •  Some are not

•  Role of R&D effort is to explore the network design space

•  What might we consider?

Classic Network

Heterogeneous Layer 3 Networking

Manually Selected Layer 2 Networking Option

Manually Selected Layer 1 Networking Option

Common Application Programming Interface

Allow Application to drive choice of network transport technologies

Allow Network to target transport technologies at individual flows

Differentiate between the Control Plan and the Data Plane

Run virtual networks across common substrate

Next Generation Network

•  Many axes of differentiation •  Heterogeneous vendors •  Heterogeneous transport technologies •  Application vs. network driven transport

technology selection •  API (e.g. Phoebus, Lambda Station, Terapaths) •  Flow analysis (e.g. FNAL flow analysis)

•  Decouple control plane from data plane •  e.g. OpenFlow

•  Run multiple virtual networks (some experimental, some production) on the same infrastructure •  e.g. OpenFlow

Design Cycle

•  General Design Cycle •  New Ideas (Chief Scientist, Network Research

community, Internet2 WGs) •  Implementation (CTO, Internet2 WGs) •  Operationalization (ED of Network Services,

IU NOC) •  Not every idea makes it through all 3

phases •  Aspects of particular technologies may

exist concurrently in different phases

Conclusion

•  Strategic planning prioritization and implementation underway

•  Development of Network Cyberinfrastructure well underway •  Calling on regionals and campuses to deploy

advanced services •  Support for network research •  Need for research in many arenas to inform

next generation network design •  Current funding paradigm does not map well to

design cycle