Upload
jisc
View
205
Download
0
Embed Size (px)
Citation preview
Science DMZ at ImperialPhil Mayers, Campus network engineering workshop
19/10/2016
1
Science DMZ at ImperialPhil Mayers <[email protected]>
About Imperial● 14,700 students, 8,000 staff
● Focused on science, engineering, medicine and business
● 6 major campuses in London, also Silwood Park, and medical sites
● Perhaps more centralised IT than many universities?
● Dual 2x10G connections to JANET
● Various sponsored a.k.a. BCE customers (NHM, Science Museum, NHS trust)
● GridPP / HEP work - close relationship with researchers
Campus network● Decent size network - ~2400 switches, ~2300 APs, 15k simultaneous wifi users,
>60k devices on-net including PCs, wifi/BYOD, SCADA, VoIP, etc.
● Campus to internet throughput ~2Gbit/s average, ~6Gbit/s peak (Oct 2016)
● Fully dual-stack network - 20-40% IPv6 by throughput, 15% by flows
● Typical architecture - switch, dist, router, core, firewall, wan
HEP group● Main HEP grid cluster processes data for the LHC experiments, other
physics experiments/projects & non-physics communities
○ CMS, LHCb, ATLAS, LZ, COMET, biomed & pheno are the main users
● 275 compute nodes (~4000 cores) connected on 1GbE
● 55 storage nodes (~3.7PB of disk) connected on 10GbE
● Simple stacked top-of-rack switches for connectivity
● Majority of WAN traffic is CMS local-storage <-> remote-storage
○ Popular datasets are automatically placed at CMS sites
○ Users can also request data: 50TB+ dataset requests not uncommon
● Local compute nodes can read remote storage over WAN (and vice versa)
○ Generally low rates compared to storage-storage transfers
HEP growth - 1gigApril
2007
HEP growth - 10gigOct
2011
HEP growth - 20gigOct
2016
Issues faced● Firewalls
○ Straight throughput
○ TCP window checking and other stateful inspection
○ Latency and jitter interfering with throughput
○ Impact on other traffic e.g. Office 365 is quite latency-sensitive with the Outlook client
● Equipment costs
○ Need the right size pipe at every forwarding hop
○ Building edge -> dist -> router -> core -> firewall -> WAN edge
○ A lot of those devices are of a class where fast ports are disproportionately costly
■ “Typical” campus router - approx. £1-2k for a 10gig port
■ 1U 48-port 10G switch - approx. £200 for a 10gig port
Solution - Science DMZ● Had no idea it had a name when we built it!
● Separate L3 switch, outside firewall, routes HEP traffic straight onto core and
onward to JANET
● Simple stateless ACLs for outer tier of security
● Fewer hops, shallower buffers, cheaper kit, wider pipes
● HEP @ Imperial - 4x10G ports to HEP, dual 2x10G ECMP to JANET
○ Split HEP into two subnets, use BGP communities outbound to split inbound traffic
○ Necessitates HEP managing which node IPs are used for transfer
Results - recent past● Quite capable of driving 4x10G at >99.5% utilisation
● Apologies for the graph - low resolution and hourly averages hiding peaks
○ Don’t be fooled - 30-second and 5-minute averages on all 4 10G links to JANET were >99% load
Architecture
Janet
Border
Firewall
CoreDatacentre
Science DMZ
Possible
Benefits● Works - capable of driving campus connectivity to capacity
● Cheap - equipment cost on our side manageable
○ As long as upstream connectivity exists, of course
● Easy - no need to poke at firewalls or building edge to improve throughput
Issues● Works too well!
● At capacity, it can drive other traffic off the campus links
○ 64 bytes from ...: icmp_seq=856 ttl=49 time=104 ms○ from a typical 2ms to the same site
○ Have seen 10gig links running at essentially 100% for >1 hour
● Need to ensure enough spare capacity for other uses
○ Rate-limiting port channels (shudder)
○ Rate-limit $here - sure it’ll be hashed to the same bundle members at $nexthop?
Results - Thu 13 OctLatency across one leg of default route, versus throughput on same
Noticeable to customers… not great. But very impressive throughput!
Issues - Mark 2● Cheap switches are cheap for a reason
● Doesn’t solve distance and fibre issues
○ Want to run in excess of 10G at distances of >10km? Get ready for a lot of zeroes
○ Fibre capacity on inter-site links (install & recurrent costs)
○ Or use DWDM (skills & training, tools, monitoring) - we do this
● Question mark over dual-use systems - is it appropriate to attach to DMZ
○ Can you do a windows domain login from a DMZ?
● Our implementation requires HEP team to split transfer nodes across two
subnets, to make use of both inbound paths
● Security policy - speak to your IT Security team first!
Thoughts● We are considering making Science DMZ a core part of network architecture
○ 100G still not cost-effective for widespread campus deployment - particularly if you are
geographically distributed
○ Build parallel cheap/fast DMZ network, hook together at JANET & datacentre?
○ Present DMZ where needed (distance & fibre issues though…)
● Considerations
○ Equipment in normal office/lab locations e.g. high-throughput gene sequencers
○ Separate switches in wiring closets - have to manage patching, labelling, training
○ Spurious requests - people think they can drive 10gig and cannot
● Only applicable for mature research efforts with good tooling, IMO
○ Took GridPP community many years to be able to drive these speeds
Recommendations● Speak to researchers!
● Consider appropriate cost/benefit of implementation
○ Cheap vs. high-end routers
○ Fixed versus expandable
● How will you scale, monitor, manage
○ Counters, API, routing/switching capability
● Consider your upstream capacity
LHCONE - if we have time● Overlay L3VPN - used to steer HEP traffic down separate links
○ Funding reasons
● Imperial already do L3VPN internally for network segmentation
○ JANET presented LHCONE as 802.1q-tagged subint & BGP peering, into L3VPN on core
○ Core presents as 2x “peerings” (internet & LHCONE) to Science DMZ router
○ DMZ router follows routing table (401 IPv4 & 146 IPv6 BGP routes)
● Basically works
○ Very impressive throughput
● Reservations internally about ultimate scalability of this model
○ If we had a multi-researcher Science DMZ - how would that work?
○ Policy routing? Shoot me now please...