20
A Study between Networks A Study between Networks and General Purpose Systems and General Purpose Systems for High Bandwidth Video for High Bandwidth Video Streaming Streaming John Bresnahan, Ioan Raicu & Gohar Margaryan June 3 rd , 2004 Computer Architecture – Spring Quarter 2004 Computer Science Department University of Chicago

John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

A Study between Networks A Study between Networks and General Purpose Systems and General Purpose Systems

for High Bandwidth Video for High Bandwidth Video StreamingStreaming

John Bresnahan, Ioan Raicu & Gohar Margaryan

June 3rd, 2004Computer Architecture – Spring Quarter 2004

Computer Science DepartmentUniversity of Chicago

Page 2: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 26/6/2004

Problem Description & Motivation

• Study interaction between network, memory and CPU– Hardware is improving at different rates– Flow of bytes between components– CPUs involvement in rate of flow

• Predict appropriate hardware for a given high bandwidth workload

• Identify bottlenecks• Visualize applications in pseudo-realtime

Page 3: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 36/6/2004

Yearly Performance Improvement

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1980 1985 1990 1995 2000 2005 2010 2015 2020

Year

Year

ly P

erfo

rman

ce Im

prov

emen

t

Memory Bandw idth Netw ork Bandw idth I/O Bus Bandw idth Disk Bandw idth Memory Latency CPU Latency

Page 4: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 46/6/2004

Historical Trends

0.01

0.1

1

10

100

1000

10000

100000

1000000

1980 1985 1990 1995 2000 2005 2010 2015 2020

Year

Band

wid

th (G

b/s)

0.0001

0.001

0.01

0.1

1

10

100

1000

Late

ncy

(nan

osec

onds

)

Memory Bandwidth Network Bandwidth I/O Bus Bandwidth Disk Bandwidth Memory Latency CPU Latency

Page 5: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 56/6/2004

Network Application Model NETWORK

Kernel Buffer

User Buffer

NIC Buffer OS

Application

TransportLayer(TCP)

TransportLayer(UDP)

Network Layer (IP)

Physical/Data LinkLayer (Ethernet)

MPEGDecoder

DMA

CPU

General Purpose System

Page 6: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 66/6/2004

Our Approach

• Create a discrete event simulator– Model network app components– Flow of data between components– Configurable parameters…

• Empirical study to collect component performance• Profiling jobs

– Visualize use of components• Achieved throughput, dropped packets

Page 7: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 76/6/2004

Page 8: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 86/6/2004

Component Modeling

• NIC– Tests over 10/100/1000 Mb/s running TCP and UDP

• Memory– Cache Burst 32– Measures L1 cache, L2 cache, and main memory read, write, and

copy throughput & latency• CPU

– Network processing• packet/second CPU Cycles per byte of header processing • 2 copies: NIC buffer Kernel buffer User buffer• Iperf over local loopback address

– MPEG• CPU Cycles per byte of processing

Page 9: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 96/6/2004

0

500

1000

1500

2000

2500

3000

MB/sec

SD-RAM @ 66 MHz SD-RAM @ 133 MHz DDR-RAM @ 332 MHzType of Memory

Main Memory Performance

Memory BandwidthMemory Read ThroughputMemory Write ThroughputMemory Copy Throughput

Page 10: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 106/6/2004

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

3500.0

4000.0

Thro

ughp

ut (M

bit/s

)

466 MHz 1300 MHz 2166 MHzProcessor Speed

TCP, UDP, and Memory Copy Performance

Achieved TCP ThroughputTheoretical TCP ThroughputAchieved UDP ThroughputTheoretical UDP ThroughputMemory Copy Throughput

Page 11: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 116/6/2004

0

50

100

150

200

250

300

350

400

Pack

ets

/ sec

ond

(thou

sand

s)Cy

cles

/ by

te

466 MHz 1300 MHz 2166 MHzProcessor Speed

TCP and UDP Performance vs. Processing Power

TCP Processing Power (packets/sec) UDP Processing Power (packets/sec)TCP Processing Overhead (cycles/byte) UDP Processing Overhead (cycles/byte)

Page 12: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 126/6/2004

Benchmarks

• MPEG_sw– software decoding (intensive CPU) and variable network traffic

• MPEG_hw– hardware decoding and variable network traffic

• RAW– Constant network traffic

Page 13: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 136/6/2004

Video 1 Required Variable Throughput (Mb/s)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3

Time (seconds)

Thro

ughp

ut (M

b/s)

Required Throughput (Mb/s)

Page 14: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 146/6/2004

Benchmark Results

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

3500.0

4000.0

Thro

ughp

ut (M

b/s)

466 MHz 1300 MHz 2166 MHzProcessor Speed

sysSIM Validation

sysSIM UDP ThroughputAchieved UDP ThroughputTheoretical UDP ThroughputMemory Copy Throughput

Page 15: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 156/6/2004

Bottleneck Shifting in Time

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

100000

1000000

1980 1985 1990 1995 2000 2005 2010 2015 2020

Year

Thro

ughp

ut (G

b/s)

Network Bandwidth Memory Copy ThroughputsysSIM Throughput UDP Theoretical Throughput - CPU bound

Page 16: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 166/6/2004

Assumptions and Weaknesses

• LAN environment– NO out of order arrival of packets– NO “lost” packets– NO erroneous packets

• Unidirectional traffic– OK for modeling UDP, but oversimplification for TCP

• TCP/UDP/IP: 2 copies of data in protocol stack• Future trends will follow past trends• Empirical studies sampled only 3 machines• Many details about network protocol stack and OS left

untouched

Page 17: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 176/6/2004

Related work

• Simulators– SimOS: complete machine simulation environment that runs commercial

OS– M5: simulation system targeting network intensive workloads that runs

unmodified commercial OS– CSIM: discrete event simulator for describing parallel processor

architectures and software mappings• Visualizations

– Visualization Tool (VT) – FlowScan: A Network Traffic Flow Reporting and Visualization Tool

• Empirical Studies– The Architectural Costs of Streaming I/O: A Comparison of Workstations,

Clusters, and SMPs– Server Network I/O Acceleration: Fundamental to the Data Center of the

Future– lmbench: Portable Tools for Performance Analysis

Page 18: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 186/6/2004

Conclusions

• Memory is not a bottleneck yet, but the gap is closing• CPU is the bottleneck, but at the rate of increase in

CPU speeds, it will not be a bottleneck for long• At the current rate of network speed increases, we

don’t foresee the network to be a bottleneck

Page 19: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 196/6/2004

Solutions and Open Problems

• Multiple memory banks• TCP offloading / Network processors• Hardware threads• Multiple processors (SMP)• Use high speed cache memory for buffers• 0-copy scheme

Page 20: John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007. 12. 28. · A Study between Networks and General Purpose Systems for High Bandwidth

Computer Architecture Presentation 206/6/2004

References

[1] S. McCanne and S. Floyd. “NS-2 Network Simulator”. http://www.isi.edu/nsnam/ns/. [2] MENDEL ROSENBLUM, EDOUARD BUGNION, SCOTT DEVINE, and STEPHEN A. HERROD. “Using the SimOS Machine Simulator to

Study Complex Computer Systems.” ACM Transactions on Modeling and Computer Simulation, Vol. 7, No. 1, January 1997, Pages 78–103.

[3] Carl Hein. “CSIM - Parallel Process and Diagrams Simulator”, Lockheed-Martin ATL, 2004, http://www.atl.lmco.com/proj/csim/simulator/csim_doc.html.

[4] Nathan L. Binkert, Erik G. Hallnor, and Steven K. Reinhardt. “Network-Oriented Full-System Simulation using M5”. Sixth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb 2003.

[5] R. H. Arpaci-Dusseau, A. C. Arpaci-Dusseau, D. E. Culler, J. M. Hellerstein, and D. A. Patterson. "The architectural costs of streaming I/O: a comparison of workstations, clusters, and SMPs," Proc. 4th Symposium on High-Performance Computer Architecture (HPCA-4), pages 90 - 101, February 1998.

[6] Mellanox Technologies. “Comparative I/O Analysis: InfiniBand Compared with PCI-X, Fiber Channel, Gigabit Ethernet, Storage over IP, HyperTransport, and RapidIO.” Mellanox Technologies White Paper, 2001. http://www.mellanox.com/technology/shared/IOcompare_WP_140.pdf.

[7] Andrea Emilio Rizzoli. “A Collection of Modelling and Simulation Resources on the Internet”, April 2004, http://www.idsia.ch/~andrea/simtools.html.

[8] Min Xu, Milo Martin+, Doug Burger*, and Mark Hill. “WWW Computer Architecture Page“, 2004, http://www.cs.wisc.edu/~arch/www/tools.html.

[9] John R. Mashey. Big Data and the Next Wave of InfraStress Problems, Solutions, Opportunities. Invited Talk, USENIX 1999. http://www.usenix.org/events/usenix99/invited_talks/mashey.pdf.

[10] Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, Kevin Gibbs. “Iperf”, March 2003, http://dast.nlanr.net/Projects/Iperf/#whatis.[11] Lawrence A. Rowe, Steve Smoot, Ketan Patel, and Brian Smith. “MPEG Video Software Statistics Gatherer.” Computer Science Division-

EECS, Univ. of Calif. at Berkeley, February 1st, 1995.[12] Vladimir Afanasiev. “Cache Burst 32”. October 17, 2002. http://user.rol.ru/~dxover/cburst/. [13] A Beginner’s Guide for MPEG-2 Standard. http://www.fh-friedberg.de/fachbereiche/e2/telekom-labor/zinke/mk/mpeg2beg/beginnzi.htm. [14] J. Postel. “User Datagram Protocol”, Request for Comments 768, Internet Engineering Task Force, August 1980. [15] DARPA Internet Program. “Transmission Control Protocol”, Request for Comments 793, Internet Engineering Task Force, September

1981. [16] Alessandro Rubini & Jonathan Corbet. “Linux Device Drivers, 2nd Edition, Chapter 14, Network Drivers”. June 2001. [17] Glenn Herrin. “Linux IP Networking: A Guide to the Implementation and Modification of the Linux Protocol Stack”. May 31, 2000.