Upload
avkash-chauhan
View
260
Download
4
Embed Size (px)
DESCRIPTION
In the age of Big Data and large volume analytics there is a lot to cover and a lot to learn. While at Microsoft developing Windows HDInsight and now developing a one of kind Big Data product at my own company Big Data Perspective, San Francisco I have lived last several years covering Big Data at various level. This talk is customized for database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of Big Data Analytics on Hadoop. This webinar will be useful for those, who wants to know what is Hadoop, and how they can take advantage just by spending few dollars to run the cluster. The webinar is great for those who are looking to deploy their first data cluster and run MapReduce jobs to discover insights.
Citation preview
Lets Start and Define Big
Data
Lets Start and
Define Big Data
How Hadoop
Fits in this scenario
http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx
https://www.linkedin.com/in/avkashchauhan
Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
Flexibility A Single Repo for
storing and analyzing any kind of data not bounded by schema
Scalability Scale-out architecture
divides workload across multiple nodes using flexible
distributed file system
Low Cost Deployed on commodity
hardware & open source platform
Fault Tolerant Continue working event if node(s) go
down
A system to move computation, where the data is.
Lets Start and Define Big Data
How Hadoop
Fits in this scenario
Hadoop Landscape
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core
Components
Data Storage
Data Processing
Hadoop Common
HDFS MapReduce
/YARN
Cloud
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
EDW
OLAP
ODS
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute
machine to super fast data delivery
Processing EC2
Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime
Processing Libraries Java based or any other language supported through Hadoop Streaming
.Net based code User uploads their code processing binaries/ libraries
Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source
Visualization Custom Custom 3rd party application can connect to storage to perform visualization
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx