Upload
deven-pinnick
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Assessment of Data Path Assessment of Data Path Implementations for Implementations for
Download and StreamingDownload and Streaming
Pål Halvorsen1,2, Tom Anders Dalseng1 and Carsten Griwodz1,2
1Department of Informatics, University of Oslo, Norway2Simula Research Laboratory, Norway
International conference on distributed multimedia systems (DMS’05), Banff, Canada, September 2005
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Overview
Motivation
Existing mechanisms in Linux
Possible enhancements
Summary and Conclusions
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Delivery Systems
Network
bus(es)
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
file systemcommunication
system
application
user space
kernel space
bus(es)
Delivery Systems
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Pentium 4Processor
registers
cache(s)
I/Ocontroller
hub
memorycontroller
hub
RDRAM
RDRAM
RDRAM
RDRAM
PCI slots
PCI slots
PCI slots
network card
disk
file system
communication system
application
file systemcommunication
system
application
disk network card
Intel Hub Architecture several in-memory data movements and context switches
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Motivation Data copy operations are expensive
consume CPU, memory, hub, bus and interface resources (proportional to data size)
profiling shows that ~40% of CPU time is consumed by copying data between user and kernel
gap between memory and CPU speeds increase different access times to different banks
System calls makes a lot of switches between user and kernel space
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
file systemcommunication
system
application
user space
kernel space
bus(es)
data_pointer data_pointer
Basic Idea of Zero–Copy Data Paths
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Motivation Data copy operations are expensive
consume CPU, memory, hub, bus and interface resources (proportional to data size)
profiling shows that ~40% of CPU time is consumed by copying data between user and kernel
gap between memory and CPU speeds increase different access times to different banks
System calls makes a lot of switches between user and kernel space
A lot of research has been performed in this area!!!! BUT, what is the status today of commodity operating
systems?
Existing Linux Existing Linux Data PathsData Paths
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Content Download
file systemcommunication
system
application
user space
kernel space
bus(es)
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Content Download: read / send
application
kernel
page cache socket buffer
applicationbuffer
read send
copycopy
DMA transfer DMA transfer
2n copy operations 2n system calls
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Content Download: mmap / send
application
kernel
page cache socket buffer
mmap send
copy
DMA transfer DMA transfer
n copy operations 1 + n system calls
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Content Download: sendfile
application
kernel
page cache socket buffer
sendfile
gather DMA transfer
append descriptor
DMA transfer
0 copy operations 1 system calls
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Content Download: Results
UDP TCP
Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Streaming
file systemcommunication
system
application
user space
kernel space
bus(es)
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Streaming: mmap / send
application
kernel
page cache socket buffer
application buffer
mmap uncork
copy
DMA transfer DMA transfer
2n copy operations 1 + 4n system calls
copy
sendsendcork
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Streaming: mmap / writev
application
kernel
page cache socket buffer
application buffer
mmap writev
copy
DMA transfer DMA transfer
2n copy operations 1 + n system calls
copy
Previous solution three less calls per packet
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Streaming: sendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 4n system calls
gather DMA transfer
append descriptor
copy
uncorksendfilesendcork
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP sendfile (content download)
Compared to not sending an RTP header over UDP, we get an increase of 29%(additional send call)
More copy operations and system calls required potential for improvements
Enhanced Streaming Enhanced Streaming
Data PathsData Paths
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: mmap / msend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + 4n system calls
gather DMA transfer
append descriptor
copy
msend allows to send data from anmmap’ed file without copy
mmap uncorksendsendcork msend
copy
DMA transfer
Previous solution one more copy per packet
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: mmap / rtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + n system calls
gather DMA transfer
append descriptor
copy
mmap uncorksendsendcork rtpmsend
RTP header copy integrated intomsend system call
previous solution require three more calls per packet
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: mmap / krtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copy
krtpmsend
previous solution require one more call per packet
An RTP engine in the kernel adds RTP headers
rtpmsend
RTP engine
previous solution require one more copy per packet
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: rtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations n system calls
gather DMA transfer
append descriptor
copy
rtpsendfile
existing solution require three more calls per packet
uncorksendfilesendcork
RTP header copy integrated intosendfile system call
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: krtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copy
krtpsendfile
previous solution require one more call per packet
An RTP engine in the kerneladds RTP headers
rtpsendfile
RTP engine
previous solution require one more copy per packet
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Enhanced Streaming: Results
Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP
send
file
(con
tent
dow
nlo
ad)Ex
isting
mec
hani
sm
(str
eam
ing)
mmap based mechanisms sendfile based mechanisms
~27%
impr
ovem
ent
~25%
impr
ovem
ent
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Conclusions Current commodity operating systems still pay a high
price for streaming services
However, small changes in the system call layer might be sufficient to remove most of the overhead
Conclusively, commodity operating systems still have potential for improvement with respect to streaming support
What can we hope to be supported?
Road ahead: optimize the code, make patch and submit to kernel.org
2005 Pål Halvorsen, Tom Anders Dalseng & Carsten Griwodz
DMS’ 05, Banff, Canada. September 2005
Questions??