Upload
javier-ramirez
View
115
Download
1
Tags:
Embed Size (px)
DESCRIPTION
I was invited to speak at the Google Startup Launch Summit in London about how we are using the google cloud to power our startup
Citation preview
javier ramirez@supercoco9
How we are usingBigQuery andApps Scripts
at teowaki
Set a distance.
Set an expiration time.
Bye bye noise.
Analytics flow
Analytics flow, by segment
Automatic Alerts
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
REST API (Ruby on Rails) +
Web on top (AngularJS)
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.
Ed Dumbill program chair for the O’Reilly Strata Conference
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
Cloud Storage:Cost-efficient storage of files
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
HadoopCassandraAmazon Redshift...
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
tools we considered:
Our choice:
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
Based on “Dremel”
Specifically designed for interactive queries over petabytes of real-time data
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
loading data
You just send the data intext (or JSON) format
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
SQL
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
specific extensions for analytics
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
web console screenshot
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
our most active user
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
country segmented traffic
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
10 request we should be caching
javier ramirez @supercoco9 http://teowaki.com startup launch summit london 14
5 most created resources
new users per month
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
Automation with Apps Script
Read from bigquery
Create a spreadsheet on Drive
E-mail it everyday as a PDF
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
cloud storage pricing
$0.032 per GB
a gzipped 4.8 MB file stores 1MM rows
$0.000092 / month per 1MM rows
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
bigquery pricing
$26 per stored TB1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB1 full scan = 160 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
£0.054307 / month*
per 1MM rows
*the 1st 100GB every month are free of charge
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
ig
Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Thanks!
Javier Ramírez@supercoco9
startup launch summit london 14