Click here to load reader

Scalability

  • View
    939

  • Download
    1

Embed Size (px)

Text of Scalability

  • 1.ScalabilityErik SchultinkInternational Week of Tech Innovation 21 Apr, 2010

2. What is Tuenti.com?
3. Tuenti.com
Started 2007
1:6 pages, 1:10 minutes
Based in Madrid
~130 employees, 60 engineers
4. 5. INTrO
What is a scalable system?
6. Scalability is throughput, not response time
7. What is a scalable system?
response
time
requests/second
8. The Problem: Concurrency
25k pageviews/second at peak
9. What is a scalable system?
response
time
requests/second
code / architecture
machines
Variables:
10. What is a scalable system?
response
time
requests/second
code / architecture
machines
Variables:
11. What is a scalable system?
response
time
x machines
2x machines
requests/second
12. 13. The Database TIER
14. The Solution: Partition
15. The Solution: Partition
16. The Solution: Partition
17. Technologies
MySQL
simple RDBMS
InnoDB
Memcache
Lighttpd
PHP
18. The Solution: Partition
Work must be structured such that each resource can complete it independently
Overhead to divide workload
19. Data architecture
Look at queries you perform.
Divide data such that each query can be answered by querying no more than 1 partition.
20. Comments on a profile
Comments (user_id, author_id, comment)
Post a comment on a users profile
Get list of comments on a users profile
Delete a comment from a users profile
Give up for now:
Comments written by a user
21. Comments on a profile
Partition by user
Costs:
Determining partition of a user
constant
Consistency check on access that author still exists
linear on number of comments to display
22. The Solution: Partition
Constant
overhead
23. Alternative Solution
Partition by user, duplicate by author
Comments(user_id, author_id, comment)
AuthoredComments(author_id, user_id, comment_id)
24. Alternative Solution
Comments(user_id, author_id, comment)
AuthoredComments(author_id, user_id, comment_id)
Costs:
double writes
extra storage
delete by author still very expensive
25. The Web SERVER Tier
26. Traditional Systems Architecture
www.tuenti.com
Load Balancer
Web server farm
Web server farm
Web server farm
27. Concurrency
28. The Solution: Partition
29. Traditional Systems Architecture
www.tuenti.com
12.45.34.179
12.45.34.178
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
30. AJAX
What is AJAX?
Asynchronous JavaScript and XML
Paradigm for client-server interaction
Change state on client, without loading a complete HTML page
31. Traditional HTML Browsing
User clicks link
Browser sends request
Server receives, parses request, generates response
Browser receives response and begins rendering
Dependent objects (images, js, css) load and render
Page appears
32. AJAX Browsing
User clicks link
Browser sends request
Server receives, parses request, generates response
Browser receives response and begins rendering
Dependent objects (images, js, css) load and render
Page appears
33. How does Tuenti use AJAX?
Only pageloads are login and home page
Loader pulls in all JS/CSS
Afterwards stay within one HTML page, rotating canvas area content
34. Balancing Load
Top-level requests to www.tuenti.com
Each request tells client which farm it should be using, based on a mapping
Mapping can be changed to balance load, perform maintenance, etc
35. Client-side Routing
www.tuenti.com
wwwb3.tuenti.com
wwwb2.tuenti.com
wwwb1.tuenti.com
wwwb4.tuenti.com
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
Linearly scalable
36. Client-side Routing
www.tuenti.com
wwwb3.tuenti.com
wwwb2.tuenti.com
wwwb1.tuenti.com
wwwb4.tuenti.com
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
Linearly scalable except for top level
37. Client-side Routing
www.tuenti.com
wwwb3.tuenti.com
wwwb2.tuenti.com
wwwb1.tuenti.com
wwwb4.tuenti.com
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
lots of content creation
= lots of dynamic data
38. Client-side Routing
www.tuenti.com
wwwb3.tuenti.com
wwwb2.tuenti.com
wwwb1.tuenti.com
wwwb4.tuenti.com
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
Cache Farm
lots of dynamic data
= lots of cache
= internal network traffic
39. Client-side Routing
www.tuenti.com
wwwb3.tuenti.com
wwwb2.tuenti.com
wwwb1.tuenti.com
wwwb4.tuenti.com
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Web server farm
Web server farm
Web server farm
Web server farm
Cache Farm
Cache Farm
Cache Farm
Cache Farm
Partition cache
Route requests to a farm near cache needed to respond
40. Internal network savings
41. SERVER-SIDE GAIN?
42. 43. 44. 45. 46. CONTENT DELIVERY
47. Image Serving
Tuenti serves ~2.5 billion images/day
At peak, this is >6 Gbps and >70k hits/sec
We use CDNs
48. What is a CDN?
Content Delivery Network
49. What is a CDN?
Examples: Akamai, Limelight
also dozens more, including Amazon
Big distributed, object cache
Pay per use
either per request, per TB transfer, or per peak Mbps
50. What is a CDN?
Advantages:
Outsource dev and infrastructure
Geographically distributed
Economies of scale
Disadvantages:
High cost
Less control and transparency
Commitments
51. What affects image load time?
Client internet connection
Response time of CDN
CDN cache hit rate
52. What affects image load time?
Client internet connection
Response time of CDN
CDN cache hit rate
53. 54. Monitor Performance from Client
Closer to performance experienced by end-user
Only way to get view of network issues faced by users (ie last mile)
55. 56. How to fix slow ISP?
Choose better transit provider
Set-up peering (or get CDN too)
Traffic management
57. What affects image load time?
Client internet connection
Response time of CDN
CDN cache hit rate
58. 59. 60. Quality of End-User Experience
vs.
Cost
61. We use multiple CDNs, and shift content based on price/performance.
62. Know your content
63. Know your content
64. Know your content
65. Know your content
30
75
200
66. Know your content
600
67. Know your content
120
68. Know your content
69. Pre-fetch Content
Exploit predictable user behavior
Ex: clicking to next photo in an album
Simple solution load next image hidden
Client browser will cache it (next response < 100 ms)
Increase tolerance for slow response time
70. Pre-fetch Content
More complex solution
Pre-fetch next canvas (full html), render in background rotate in on Next
Even more complex
Instantiate HTML template w/ dataon client
Pre-fetch data X photos in advance, render Y templates in advance with this data
71. Pre-fetch Content
Problems:
Rendering still takes time
Increases browser load
Need to set cache headers correctly
72. Image delivery
Small images: High request, low volume
Most cost-effective to cache in memory
Large images: High volume, low requests, greater tolerance for latency
73. What affects image load time?
Client internet connection
Response time of CDN
CDN cache hit rate
74. Monitor Performance from Client
cold servers online
75. More
jobs.tuenti.com
dev.tuenti.com
76. Q & A

Search related