Upload
james-chittenden
View
1.487
Download
0
Embed Size (px)
Citation preview
IoT @ Google ScaleJames Chittenden Google Cloud Platform Solutions [email protected]
Manage the Entire Lifecycle of Big Data
Cloud Logs
Google App Engine
Google Analytics Premium
Cloud Pub/Sub
BigQuery Storage(tables)
Cloud Bigtable(noSQL)
Cloud Storage(files)
Cloud Dataflow
BigQuery Analytics(SQL)
Capture Store Analyze
Batch
Real time analytics and Alerts
Cloud DataStore
Process
Stream
Cloud Dataflow
Cloud Monitoring
Device to Device Protocols
● Device Discovery● Device to Device authentication● Device Configuration● Protocol Routing
Machine Learning: Pattern Detection and Prediction
● Subscribers scan real time streams and feed data into the Machine Learning Recognition algorithm
● Dataflow Orchestrates streaming algorithms which compare data streams against Experience Database
● Correlators detect known patterns and publish alerts using Cloud Pub/Sub
Cloud Storage Archival and Retrieval
● Data is periodically unloaded from Big Table and stored in Cloud Storage for archival
● Data in Cloud Storage can be quickly re-loaded in Big Table should it need to be re-processed.
Messaging is a shock-absorber
Throughput LatencyAvailability
Images by Connie Zhou
• Buffer new requests during outages
• Prevent overloads that cause outages
• Redirect requests to recover from outages
• Smooth out spikes in new request rate
• Balance load across multiple workers
• Balance arrival rate with service rate
• Accept requests closer to the network edge
• Optimize message flow across regions
• Leverage shared efforts to improve protocols
Pub/Sub is a change-absorber
Sinks TransformsSources
Images by Connie Zhou
• New data sources can plug into old data flows
• New data sources can use new schemas
• Common security policies for all sources
• Data can be sent to new destinations
• Push and Pull delivery are both available
• Spans organizational boundaries
• Select subsets of messages that matter
• Helps manage schema and version changes
• Can merge streams into new topics
Chat & Mobile
Every time your GMail box pops up a new message, it’s because of a push notification to your browser or mobile device.
One of the most important real-time information streams in the company is advertising revenue — we use Pub/Sub to broadcast budgets to our entire fleet of search engines
Google Cloud Messaging for Android delivers billions of messages a day, reliably and securely for Google’s own mobile apps and the entire developer community
Updating search results as you type is a feat of real-time indexing that depends on Pub/Sub to update caches with breaking news
Ads & Budgets Instant SearchPush Notifications
Pub/Sub at Google
HTTP ServerSubscriber
Pub/Sub System
WebhookDelivery
Publisher
Topic
Subscription
HTTP PushDelivery
GoogleApp Engine
Pull Subscriber
Subscription Subscription
Google RPCDelivery
CloudDataflow
Subscription
On-Prem/Cloud Any Environment
Subscriber
Msg
Pub/Sub System
Subscriber
Msg
Pub/Sub System
Ack
RPC SendRPC Return
Ack
Push Subscription Pull Subscription
Google Technologies
SpannerDremelMapReduce
Big Table MillWheel
2012 2014+2002 2004 2006 2008 2010
GFS
2013
More!
Flumejava
Colossus
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Deploy
Schedule & Monitor
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
800 RPS 1200 RPS 5000 RPS 50 RPS
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
.apply(PubsubIO.Read.from(“input_topic”))
.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))
.apply(PubsubIO.Write.to(“output_topic”));
Dataflow Goodies
Dataflow
Your Data BigQuery
Fast ETLRegexJSONUDFs
Spreadsheets
BI Tools
Coworkers
Applications + Reports PubSub
Cloud Storage
BigTable
Enterprise Big Data Architecture on Google
Plus True Stream Processing
Plus Autoscaling and per-minute billing
All the benefits of Hadoop-on-Google
Plus a Fully-Managed Service
Plus New, Intuitive Framework
1
2
3
4
5
Why Dataflow?