28
Assessment of Data Path Assessment of Data Path Implementations for Implementations for Download and Streaming Download and Streaming Pål Halvorsen 1,2 , Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics, University of Oslo, Norway 2 Simula Research Laboratory, Norway national conference on distributed multimedia systems (DMS’05), Banff, Canada, Septembe

Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

Embed Size (px)

Citation preview

Page 1: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

Assessment of Data Path Assessment of Data Path Implementations for Implementations for

Download and StreamingDownload and Streaming

Pål Halvorsen1,2, Tom Anders Dalseng1 and Carsten Griwodz1,2

1Department of Informatics, University of Oslo, Norway2Simula Research Laboratory, Norway

International conference on distributed multimedia systems (DMS’05), Banff, Canada, September 2005

Page 2: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Overview

Motivation

Existing mechanisms in Linux

Possible enhancements

Summary and Conclusions

Page 3: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Delivery Systems

Network

bus(es)

Page 4: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

file systemcommunication

system

application

user space

kernel space

bus(es)

Delivery Systems

Page 5: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Pentium 4Processor

registers

cache(s)

I/Ocontroller

hub

memorycontroller

hub

RDRAM

RDRAM

RDRAM

RDRAM

PCI slots

PCI slots

PCI slots

network card

disk

file system

communication system

application

file systemcommunication

system

application

disk network card

Intel Hub Architecture several in-memory data movements and context switches

Page 6: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Motivation Data copy operations are expensive

consume CPU, memory, hub, bus and interface resources (proportional to data size)

profiling shows that ~40% of CPU time is consumed by copying data between user and kernel

gap between memory and CPU speeds increase different access times to different banks

System calls makes a lot of switches between user and kernel space

Page 7: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

file systemcommunication

system

application

user space

kernel space

bus(es)

data_pointer data_pointer

Basic Idea of Zero–Copy Data Paths

Page 8: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Motivation Data copy operations are expensive

consume CPU, memory, hub, bus and interface resources (proportional to data size)

profiling shows that ~40% of CPU time is consumed by copying data between user and kernel

gap between memory and CPU speeds increase different access times to different banks

System calls makes a lot of switches between user and kernel space

A lot of research has been performed in this area!!!! BUT, what is the status today of commodity operating

systems?

Page 9: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

Existing Linux Existing Linux Data PathsData Paths

Page 10: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Content Download

file systemcommunication

system

application

user space

kernel space

bus(es)

Page 11: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Content Download: read / send

application

kernel

page cache socket buffer

applicationbuffer

read send

copycopy

DMA transfer DMA transfer

2n copy operations 2n system calls

Page 12: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Content Download: mmap / send

application

kernel

page cache socket buffer

mmap send

copy

DMA transfer DMA transfer

n copy operations 1 + n system calls

Page 13: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Content Download: sendfile

application

kernel

page cache socket buffer

sendfile

gather DMA transfer

append descriptor

DMA transfer

0 copy operations 1 system calls

Page 14: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Content Download: Results

UDP TCP

Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP

Page 15: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Streaming

file systemcommunication

system

application

user space

kernel space

bus(es)

Page 16: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Streaming: mmap / send

application

kernel

page cache socket buffer

application buffer

mmap uncork

copy

DMA transfer DMA transfer

2n copy operations 1 + 4n system calls

copy

sendsendcork

Page 17: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Streaming: mmap / writev

application

kernel

page cache socket buffer

application buffer

mmap writev

copy

DMA transfer DMA transfer

2n copy operations 1 + n system calls

copy

Previous solution three less calls per packet

Page 18: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Streaming: sendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 4n system calls

gather DMA transfer

append descriptor

copy

uncorksendfilesendcork

Page 19: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP

TCP sendfile (content download)

Compared to not sending an RTP header over UDP, we get an increase of 29%(additional send call)

More copy operations and system calls required potential for improvements

Page 20: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

Enhanced Streaming Enhanced Streaming

Data PathsData Paths

Page 21: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: mmap / msend

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 1 + 4n system calls

gather DMA transfer

append descriptor

copy

msend allows to send data from anmmap’ed file without copy

mmap uncorksendsendcork msend

copy

DMA transfer

Previous solution one more copy per packet

Page 22: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: mmap / rtpmsend

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 1 + n system calls

gather DMA transfer

append descriptor

copy

mmap uncorksendsendcork rtpmsend

RTP header copy integrated intomsend system call

previous solution require three more calls per packet

Page 23: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: mmap / krtpmsend

application

kernel

page cache socket buffer

application buffer

DMA transfer

0 copy operations 1 system call

gather DMA transfer

append descriptor

copy

krtpmsend

previous solution require one more call per packet

An RTP engine in the kernel adds RTP headers

rtpmsend

RTP engine

previous solution require one more copy per packet

Page 24: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: rtpsendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations n system calls

gather DMA transfer

append descriptor

copy

rtpsendfile

existing solution require three more calls per packet

uncorksendfilesendcork

RTP header copy integrated intosendfile system call

Page 25: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: krtpsendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

0 copy operations 1 system call

gather DMA transfer

append descriptor

copy

krtpsendfile

previous solution require one more call per packet

An RTP engine in the kerneladds RTP headers

rtpsendfile

RTP engine

previous solution require one more copy per packet

Page 26: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Enhanced Streaming: Results

Tested streaming of 1 GB file on Linux 2.6 RTP over UDP

TCP

send

file

(con

tent

dow

nlo

ad)Ex

isting

mec

hani

sm

(str

eam

ing)

mmap based mechanisms sendfile based mechanisms

~27%

impr

ovem

ent

~25%

impr

ovem

ent

Page 27: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Conclusions Current commodity operating systems still pay a high

price for streaming services

However, small changes in the system call layer might be sufficient to remove most of the overhead

Conclusively, commodity operating systems still have potential for improvement with respect to streaming support

What can we hope to be supported?

Road ahead: optimize the code, make patch and submit to kernel.org

Page 28: Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,

2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz

DMS’ 05, Banff, Canada. September 2005

Questions??