View
217
Download
1
Tags:
Embed Size (px)
Citation preview
1
YouTube Traffic Characterization:
A View From the Edge
Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³
¹Dept. of Computer Science, University of Calgary, Canada
²Enterprise Systems & Software Lab, HP Labs, USA
³Dept. of Computer Science and Engineering, IIT Delhi, India
2
Introduction
The way people use the Web is changing.
Creation and sharing of media: Fast, easy, cheap!
Volume of data associated with extremely popular online media.
3
What is Web 2.0? User generated content
Text: Wordpress, Blogspot Photos: Flickr, Facebook Video: YouTube, MySpace
Social Networking Facebook, MySpace
Tagging Flickr, YouTube
4
YouTube: Facts and Figures
Founded in February 2005 Enabled users to easily share movies
by converting them to Flash Largest video sharing Website on
the Internet [Alexa2007] Sold to Google for $1.65 billion in
November 2006
5
How YouTube Works (1/2)
GET: /watch?v=wQVEPFzkhaM
OK (text/html)
GET: /vi/fNaYQ4kM4FE/2.jpg
OK (img/jpeg)
6
How YouTube Works (2/2)
GET: swfobject.js
OK (application/x-javascript)
GET: /p.swf
OK (video/flv)
GET: /get_video?video_id=wQVEPFzkhaM
OK (application/shockwave-flash)
7
Our Contributions Efficient measurement framework One of the first extensive
characterizations of Web 2.0 traffic File properties File access patterns Transfer properties
Implications for network and content providers
8
Outline
Introduction & Background Contributions Methodology Results Implications Conclusions
9
Our View Points
Edge (University Campus) 28,000 students 5,300 faculty & staff /16 address space 300Mb/s full-duplex network link
Global Most popular videos
10
Campus Data Collection Goals:
Collect data on all campus YouTube usage Gather data for an extended period of
time Protect user privacy
Challenges: YouTube’s popularity Monitor limitations Volume of campus Internet usage
11
Our Methodology
Identify servers providing YouTube content
Use bro to summarize each HTTP transaction in real time
Restart bro daily and compress the daily log
Map visitor identifier to a unique ID
12
Categories of Transactions
Complete – the entire transaction was parsed successfully
Interrupted – TCP connection was reset
Gap – monitor missed a packet Failure – transaction could not be
parsed
13
Categories of Transactions (2)
Status % of Total % of Video
Complete 90.82 24.66
Interrupted 1.88 24.25
Gap 1.56 51.09
Failure 5.75 -
14
Our Traces
Start Date: Jan. 14, 2007End Date: Apr. 8, 2007Total Valid Transactions:
23,250,438
Total Bytes: 6.54 TBTotal Video Requests:
625,593
Total Video Bytes: 6.45 TBUnique Video Requests:
323,677
Unique Video Bytes: 3.26 TB
15
HTTP Response Codes
Code % of Responses % of Bytes
200 (OK) 75.80 89.78
206 (Partial Content)
1.29 10.22
302 (Found) 0.05 0.00
303 (See Other) 5.33 0.00
304 (Not Modified)
17.34 0.00
4xx (Client Error) 0.19 0.00
5xx (Server Error) 0.01 0.00
16
Global Data Collection Crawling all videos is infeasible Focus on top 100 most popular
videos Four time frames: daily, weekly,
monthly and all time. 2 step data collection:
Retrieve pages of most popular videos Use YouTube API to get details on
these videos
17
Outline
Introduction & Background Contributions Methodology Results Implications Conclusions
18
Results
Campus Usage Patterns File Properties File Access Patterns Transfer Properties
19
Campus Usage Patterns
ReadingBreak
20
Results
Campus Usage Patterns File Properties File Access Patterns Transfer Properties
21
Unique File Sizes
Video data is significantly larger than the other content types
22
Time Since Modification Videos and
images rarely modified
Text and application data modified more frequently
23
Video Durations Spike around 3 minutes likely music videos Campus videos are relatively short: μ=3.3
min
24
Summary of File Properties
Video content is much larger than other content types
Image and video content is more static than application and text content
Video durations are relatively shortVideos viewed on campus tend to be
more than 1 month old
25
Results
Campus Usage Patterns File Properties File Access Patterns Transfer Properties
26
Relative Popularity of Videos Video popularity
follows a weak Zipf distribution
Possibly due to edge network point of view
β = 0.56
27
Commonality of Videos
~10% commonality between consecutive days during the week
~5% commonality between consecutive days on the weekend
28
Summary of File Referencing
Zipf distribution is weak when observed from the edge of the network
There is some overlap between videos viewed on consecutive days
Significant amount of content viewed on campus is non-unique
29
Results
Campus Usage Patterns File Properties File Access Patterns Transfer Properties
30
Transfer SizesFlash player (p.swf, player2.swf)
Javascripts
31
Transfer Durations Video transfers
have significantly longer durations than other content types
32
Summary of Transfer Properties
Javascript and flash objects have an impact on the size of files transferred
Video transfers have significantly larger sizes and durations
33
Outline
Introduction & Background Contributions Methodology Results Implications Conclusions
34
Implications for Network Providers
Web 2.0 poses challenges to caching Larger multimedia files More diversity in content
Meta data may be used to improve caching efficiency
35
Implications for Content Providers
Multimedia content is large! 65,000 videos/day x 10MB/video = 19.5 TB/month
Long tail effect -> much of the content will be unpopular Cheap storage solutions
Longer transfer durations for video files more CPU cycles required for transfers
36
Conclusions Multimedia content has much larger
transfer sizes and durations than other content types
From the edge of the network, video popularity follows a weak Zipf distribution
Web 2.0 facilitates diversity in content which poses challenges to caching
New approaches are needed to efficiently handle the resource demands of Web 2.0 sites