Upload
garry-marsh
View
307
Download
2
Embed Size (px)
Citation preview
Google and Cloud ComputingGoogle与云计算
王咏刚
Google 资深工程师
Agenda
• The Internet: From Hardware to Community
• The Innovation: A Computing Cloud
• Breakthroughs for Cloud Computing
• Google Apps for Cloud Computing
• Google Infrastructure for Cloud Computing
The InternetFrom Hardware to Community
The Internet: From Hardware to Community
MySpace
开心网校内网……
What Do Today’s Users Want?
• Accessibility– Access from anywhere and from multiple devices
• Shareability– Make sharing as easy as creating and saving
• Freedom– Users don’t want their data held hostage
• Simplicity– Easy-to-learn, easy-to-use
• Security– Trust that data will not be lost or seen by unwanted parties
6
The InnovationA Computing Cloud
Cloud Computing
7
Attributes of Cloud Computing
8
• Data stored on the cloud• Software & services on the cloud - Access via web browser
• Based on standards and protocols - Linux, AJAX, LAMP, etc.
• Accessible from any device
Hardware Centric Software Centric Service Centric
Personal PC Client Server Cloud Computing
9
Breakthroughs for Cloud Computing
Breakthroughs for Cloud Computing
10
User-Centric1
Task-Centric2
Powerful3
Intelligent4
Affordable5
Programmable6
User Centric
Data stored in the “Cloud”
Data follows you & your devices
Data accessible anywhere
Data can be shared with others
music
preferences
maps
newscontacts
messages
mailing lists
photo
e-mails
calendar
phone numbers
investments
Example : GMail
– Just a web browser and your account with password!– Once you login, the device is “yours”.– Data stored on remote servers in the “cloud” (with large capacity)
Beijing, on travel
San Francisco, Monday
Home, Wednesday
Use Google Docs to Solve a Task
Access your docs from anywhere
Chat with others in real time
Changes instantly appear to other collaborators
Task = “Teachers creating a departmental curriculum”
Communication Task – Email, Chat, Contacts, Chat History
Task: Collaborate on Spreadsheet – Communicate
Chat with others editing
the spreadsheet
Task: Collaborate on Spreadsheet – Collaborate
Invite others to collaborate on
the spreadsheet
Task: Collaborate on Spreadsheet – Publish
Invite others to view the
spreadsheet
You can also easily organize all your common tasks
Cloud Computing is Powerful: It can do what no PC can do
Is Google Search faster than search in Windows/Outlook/Word?
• And Google Search must be much harder….
How much storage does it take to store all of the web pages?
• 100B pages * 10K per page = 1000T disk!
Cloud computing has at its disposal
• Essentially infinite amount of disk
• Essentially infinite amount of computation
• (Assuming they can be parallelized)
Example: Google Search
Web Page Search Universal Search
W
1st Generation: era of single search – not diverse2nd Generation: era of vertical search – too complex
3rd Generation: an era of Universal Search
A
B
C
D
E
From vertical search to universal search
A
B
C
D
E
Integration of user experience
Universal Search Example
Universal Search Example
Cloud Computing Infrastructure
25
GFS Architecture
Google48%
MSN19%
Yahoo33%
• Files broken into chunks (typically 64 MB)• Master manages metadata• Data transfers happen directly between clients/chunkservers
Client
ClientClientRep
licas
Masters
GFS Master
GFS Master
C0 C1
C2C5
Chunkserver 1
C0
C2
C5
Chunkserver N
C1
C3C5
Chunkserver 2
…
ClientClient
ClientClient
ClientClient
Typical Cluster
26
Scheduling masters
GFSchunkserver
Schedulerslave
Linux
Machine 1
User app2
Userapp1
…
GFS masterLock service
GFSchunkserver
Schedulerslave
Linux
Machine N
Userapp3
User app2
Userapp1
GFSchunkserver
Schedulerslave
Linux
Machine 2
Userapp3
MapReduce
27
More specifically…
28
• Programmer specifies two primary methods:– map(k, v) → <k', v'>*
– reduce(k', <v'>*) → <k', v'>*
• All v' with same k' are reduced together, in order.
• Usually also specify:– partition(k’, total partitions) -> partition for k’
• often a simple hash of the key
• allows reduce operations for different k’ to be parallelized
29
BigTable
• Distributed multi-level map– With an interesting data model
• Fault-tolerant, persistent
• Scalable– Thousands of servers
– Terabytes of in-memory data
– Petabyte of disk-based data
– Millions of reads/writes per second, efficient scans
• Self-managing– Servers can be added/removed dynamically
– Servers adjust to load imbalance
30
BigTable: Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) cell contents
• Good match for most of our applications
……
“<html>…”
t1t2
t3www.cnn.com
ROWS
COLUMNS
TIMESTAMPS
“contents”
BigTable: System Architecture
Cluster Scheduling Master
handles failover, monitoring
GFS
holds tablet data, logs
Lock service
holds metadata,handles master-election
Bigtable tablet server
serves data
Bigtable tablet server
serves data
Bigtable tablet server
serves data
Bigtable master
performs metadata ops,load balancing
Bigtable cellBigtable clientBigtable client
library
Open()
Thanks
Q&A