Upload
dharmesh-vaya
View
720
Download
2
Tags:
Embed Size (px)
Citation preview
Dharmesh Vaya @DRVaya
http://drvaya.wordpress.com/
Agenda● What is Big Data ?
● Available Big Data Solutions & Issues
● Why Google BigQuery ?
● Inside BigQuery
● Features & Components
● RESTful API
● Development with BigQuery (Live Demo)○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing
Queries, Save Results.
○ Integration with Applications.
● BigQuery Tools
● Big Data Solution with BigQuery & Google Cloud Platform
● Pricing Model
● Any questions ?
What is Big Data ?
Is it a Data Type ? No
Its a buzzword - massive volume of structured and/or unstructured data.
It is so large that it is difficult to process/analyze using traditional databases.
What is Big Data ?
Data that has following attributes can be ‘Big Data’
So how Big is B - I - G ?
So how Big is B - I - G ?
Library of Congress - Textual Data
20 Terabytes
(20 000 000 000 000 bytes)
So how Big is B - I - G ?
Amazon.com - Inventory &Customer Data
42 Terabytes
(42 000 000 000 000 bytes)
So how Big is B - I - G ?
YouTube.com - Media Data
100+ Terabytes
(100 000 000 000 000 bytes)
So how Big is B - I - G ?
Google.com - Search, Mail, Media & anything you can think of !!
850+ Terabytes
(850 000 000 000 000 bytes)(Speculated Figures)
So how Big is B - I - G ?
World Data Center for Climate - Meteorology Data
6.2 Petabytes
(7 000 000 000 000 000 bytes)
Available Big Data Solutions & Issues
- Highly Scalable and Distributed Computing.- Storage (HDFS) optimized for high throughput
- Security, disabled by default- MapReduce is batch based, hence no real time operations.- Costly to maintain.
- Highly Scalable, talks of handling Petabytes- Elastic set of resources to return result sets - Almost 10x fast as compared to Hadoop.
- High costs of Data Migration and integration- Operations/Maintenance cost may shoot up
Why Google BigQuery ?
Hadoop (with Hive)
AmazonRedshift
Google BigQuery
= 1.4 TB
On an average its within 8-10 seconds !!
Inside Google BigQuery
● BigQuery is based on Dremel, a technology pioneered by Google & extensively used within.
● It used Columnar storage & multi-level execution trees to achieve interactive performance for queries against multi-terabyte datasets.
● BigQuery's performance advantage comes from its parallel processing architecture.
● The query is processed by thousands of servers in a multi-level execution tree structure, with the final results aggregated at the root. BigQuery stores the data in a columnar format so that only data from the columns being queried are real.
● All this & more is now available as a publicly available service for any business or developer to use. This release made it possible for those outside of Google to utilize the power of Dremel for their Big Data processing requirements.
Columnar Storage & Trees
Inside Google BigQuery
There’s a difference
● Dremel is designed as an interactive data analysis tool for large datasets.
● MapReduce is designed as a programming framework to batch process large datasets
Hey you mentioned Dremel,
isn’t Map Reduce based on it ?
Features & Components
Features:● Web GUI for BigQuery● Affordable● Run in Background● Easy Data Importation● Flexible (Addition of Columns, Native Support For Timestamp Type
Of Data)● REST API Support● More than just Standard SQL
Components:● Project● Tables● DataSets● Jobs
RESTful APIMethod HTTP Request
delete DELETE /projects/projectId/datasets/datasetId
get GET /projects/projectId/datasets/datasetId
insert POST /projects/projectId/datasets
list GET /projects/projectId/datasets
patch PATCH /projects/projectId/datasets/datasetId
update PUT /projects/projectId/datasets/datasetId
For Datasets
RESTful API
Method HTTP Request
delete GET /projects/projectId/jobs/jobId
getQueryResults
GET /projects/projectId/queries/jobId
insert POST
https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobsandPOST /projects/projectId/jobs
list GET /projects/projectId/jobs
query POST /projects/projectId/queries
For Jobs
Similar methods for -
● Projects● Tables● TableData
Demo using Web Interface
Demo : Excel Connector
+
BigQuery ToolsBigQuery Excel Connector bq Command LineBigQuery Browser Tool
Virtualization & BI Tools
ETL Tools
ODBC Connector
Big Data Solution with BigQuery
Big Data Solution with BigQuery
Data Pipeline - transforming and loading data into BigQuery
The process of using the Google Cloud Platform to upload data into BigQuery involves
uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before
loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic
integration into the current computing environment.
Data Visualization - performing data analysis on BigQuery and visualizing the results
A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST
API to execute the queries and using Google Chart Tools to visualize the results
Pricing Model
Action Example
Loading Data Loading files/data into BigQuery
Exporting Data Exporting data, Saving Results from BigQuery
Table Reads Browsing through data
Table Copies Copy existing table to new table
Storage Action Cost
Storage $0.020 per GB, per month.
Streaming Inserts Free until January 1, 2015. After January 1, 2015, $0.01 per 100,000 rows
Query Pricing Cost
On-demand $5 per TB
Reserved Capacity
5GB per second$20k/ month
Wow that’s like 800MB for 1 Rupee, even Internet ain’t that cheap here.
Where to use ?
● Not a replacement to traditional systems, but it compliments the eco-system !!
● Major strength is Handling Large DataSets
● Major usage in Data Analytics
● Important component of Google Cloud Platform
● People are interested in numbers/data and that too quick….
Google BigQuery is the future of Analytics!!
Any questions ?
What we covered ...
✓ What is Big Data ?✓ Available Big Data Solutions & Issues✓ Why Google BigQuery ?✓ Features, Components & Tools✓ RESTful API✓ Demo using Web Interface✓ Big Query Tools✓ Big Data Solution with BigQuery✓ Pricing Model✓ Usage
https://bigquery.cloud.google.comNo registration, just sign-in with your Google account
Follow Dharmesh Vaya on @DRVaya
or subscribe to my http://drvaya.wordpress.com/
You can also add me on +DharmeshVaya
About the presenter
https://cloud.google.com/developers/articles/getting-started-with-google-bigquery
https://cloud.google.com/files/Redbus.pdf
http://www.reddit.com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_bigquery/
http://www.datawrangling.com/some-datasets-available-on-the-web/
http://bigqueri.es/
https://developers.google.com/bigquery/pricing#data