Upload
eufris
View
1.593
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Big Data-esitys 09.02.2012
Citation preview
Big Data
Eufris 2012
Why should I care?
McKinsey:•$250 billions annual savings in EU alone by enhancing public sector•$600 billions annual consumer surplus from using personal location data globally
•Annual growth of data is remarcable•Data is the most valuable thing most companies have•Data is massively underutilized
Eufris 2012
Forecast
There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
Eufris 2012
What is Big Data?"Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis"
IDC
"Big Data is a technlogy that helps extract value from the digital universe.”
IDC
"Techniques and technologies that make handling data at extreme scale economical."
Forrester
Eufris 2012
ABC of Big Data
Analy&cs•making sense of your data, in real-‐5me, in easy way
Bandwidth•inges5ng, prosessing and delivering large amounts of data
Content•storing, managing and retaining large amounts of data
Eufris 2012www.netapp.com
3 V’s of Big Data
Variety• Big Data extends beyond structured data, including unstructured data of all varie5es: text, audio, video, click streams, log files and more
Velocity• o@en 5me sensi5ve, Big Data must be used as it is streaming in to the enterprise in order to maximize its value to the business
Volume• Big Data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of informa5on
Eufris 2012
Few core concepts
Eufris 2012
Hadoop
•The Apache Hadoop so.ware library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.
•Three subprojects•Hadoop Common•Hadoop Distributed Filesystem (HDFS)•Hadoop MapReduce
Eufris 2012
MapReduce
•Introduced by Google in 2004
Map
2
2
2
1
2
3
Reduce 3
4
5
Eufris 2012
MapReduce on App Engine
• Mapreduce is an experimental, innovaNve, and rapidly changing new feature for App Engine
Eufris 2012
NoSQL
•DefiniNon 1
“Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent, a huge data amount, and more.”nosql-database.org
Eufris 2012
NoSQL
•DefiniNon 2
“In computing, NoSQL (sometimes expanded to "not only SQL") is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.”Wikipedia
Eufris 2012
From ACID to BASE
ACID:Atomicity, Consistency, Isola&on, Durability
BASE:Basically available, So? state, Eventually consistent
Eufris 2012
Big Data and cloud
Eufris 2012
Big Data on AWS
Eufris 2012
MapReduce on AWS
• Not yet Hadoop 1.0.0
Eufris 2012
MapReduce on AWS
S3EC2
+ DynamoDB
Eufris 2012
Google BigQuery
Features• Speed - Analyze billions of rows(!) in seconds• Scale - Terabytes of data, trillions of records• Simplicity - SQL-like query language, hosted on
Google infrastructure• Sharing - Powerful group- and user-based permissions
using Google accounts• Security - Secure SSL access• Multiple access methods - Can be used by REST
API, a command-line tool, a browser-based graphical interface, and Google Apps Script
Eufris 2012
BigQuery example
Eufris 2012
Big Data outside of cloud
Eufris 2012
Oracle Big Data Appliance
18 Oracle Sun Servers• 864 GB main memory;• 216 CPU cores;• 648 TB of raw disk storage;• 40 Gb/s InfiniBand connectivity between nodes and engineered systems;• 10 Gb/s Ethernet connectivity.
About 500 000 $
Eufris 2012
Autonomy IDOL 10
"For far too long, organizations have confined structured data to relational databases and unstructured data to simplistic keyword matching technologies..."
“IDOL 10 brings these worlds together, allowing organizations to automatically process, understand, and act on 100 percent of their data, in real-time. The results will be dramatic, as businesses can develop entirely new applications that explore the richness and color of Human Information that live in unstructured, semi-structured, and structured forms.”
Price?
Eufris 2012
Thank you!
Eufris 2012