Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BCS Open Source SIG | London | 1 May 2013
Open Source Drives Innovation & Adoption in Big Data
Realising Value from DataTogetherwith
Timings
6:00 - 6:30pm. Register / Refreshments6:30 - 8:00pm, Presentation Session8:00 - 8:30pm, Networking
Objectives• What is Big Data?• Evolution of Open Source Hadoop and its influence on the Big Data
phenomenon• The Open Source ecosystem around Big Data• What is the importance of Open Source for Hadoop?• What business challenges does Hadoop address?• What does a Hadoop architecture look like?• How have MapR built a unique offering on top of Hadoop?
1© 2013 Onepoint IQ Limited | London
Introduction
Big Data & Apache Hadoop
MapR / Drill Demo
Summary & Further Learning
2© 2013 Onepoint IQ Limited | London
About Onepoint IQOnepoint IQ empowers individuals and organisations to discover and deliverreal value from big (and small) data
3© 2013 Onepoint IQ Limited | London
About MapROne Platform for Big Data
4© 2013 Onepoint IQ Limited | London
…
Batch
99.999% HA DataProtection
DisasterRecovery
Scalability&
PerformanceEnterpriseIntegration Multi-tenancy
MapReduce
File-BasedApplications SQL Database Search Stream
Processing
Interactive Real-time
…Broad
range ofapplications
Recommendation Engines Fraud Detection Billing Logistics
Risk Modeling Market Segmentation Inventory Forecasting
Introduction – Onepoint IQ & MapRPresentation Team Today
5© 2013 Onepoint IQ Limited | London
Michael HausenblasChief Data Engineer - EMEA
Shashin ShahTechnology Zen Master | Founding Dir
Introduction
Big Data & Apache Hadoop
MapR / Drill Demo
Summary & Further Learning
6© 2013 Onepoint IQ Limited | London
Big Data DefinitionsAlthough there is not a universally accepted definition for ‘Big Data’, all acknowledge thetechnology advances that are now available to handle data (big and small).
7© 2013 Onepoint IQ Limited | London
“Datasets whose size isbeyond the ability oftypical database softwaretools to capture, store,manage, and analyse.”
“Big data technologiesdescribe a newgeneration oftechnologies andarchitectures, designedto economically extractvalue from very largevolumes of awide variety of data, byenabling high-velocitycapture, discovery,and/or analysis.”
“Big data is data thatexceeds the processingcapacity of conventionaldatabase systems. Thedata is too big, moves toofast, or doesn't fit thestrictures of yourdatabase architectures.To gain value from thisdata, you must choosean alternative way toprocess it.”
Business Needs Spurn InnovationWeb search engines were among the first to confront the ‘Big Data’ problem. Today, socialnetworks, mobile phones, sensors and science contribute to petabytes of data created daily.
8
advertising optimisationmail anti-spam
video & audio processingad selection
web search
user interest prediction
customer trendanalysis
analysing web logs
content optimisation
data analytics
machine learning
data mining
textmining
social media
Source: Hortonworks, Apache Lucene Eurocon, Barcelona, 2011
© 2013 Onepoint IQ Limited | London
Big Data AdvancesThe technology advances around ‘Big Data’ can be broadly grouped into two categories.
9© 2013 Onepoint IQ Limited | London
Datamanagement
advances
Analyticsadvances
Big, fast data Big, better analytics
Traditional vs. Big Data – Data Management AdvancesThe advances donot replace existing enterprise data warehousing (EDW), businessintelligence (BI), or analytics approaches –they instead enhance andextend them.
10© 2013 Onepoint IQ Limited | London
• Integrated data sources • Virtualised and blended data sources
• Structured data • Unstructured, semi-structured, multi-structureddata
• Aggregated and granular data (with limitations) • Large volumes of granular data (without limits)
• Relational EDW with at-rest data • Non-relational EDW with at-rest data
• Dimensional cubes / marts with at-rest data • Streaming systems with in-motion data
• One-size-fits-all data management • Flexible and optimised data management
Traditional decision-making environment
Big data innovationsand extensions
Datamanagement
advances
Traditional vs. Big Data – Analytics advancesThe advances donot replace existing enterprise data warehousing (EDW), businessintelligence (BI), or analytics approaches –they instead enhance andextend them.
11© 2013 Onepoint IQ Limited | London
• Reporting and OLAP • Advanced analytic functions and predictivemodels
• Dashboards and scorecards • Sophisticated visualisation of large data sets
• Structured navigation (drill down / up; slice /dice)
• Flexible exploration of large data sets
• Humans interpret results, patterns and trends • Sophisticated trend and pattern analysisthrough machine learning
• Manual analyses, decisions, actions • Model / rules-driven decisions & actions
Analyticsadvances
Traditional decision-making environment
Big data innovationsand extensions
Our View‘Big Data’ is data that becomes large enough that it cannot be processed usingconventional methods. A fantastic array of advances aim to address this challenge.
12© 2013 Onepoint IQ Limited | London
“True business value comes from:
• properly choosing and applying• appropriate ‘Big Data’ technology
advances to• specific data challenges (or
opportunities) of an organisation• whether big or small.”
Use Case Clusters (or Deployment Patterns)To speed up the discovery of benefits and value from Big (and small) Data, we group thenearly endless permutations of Big Data use cases into a few key clusters.
13© 2013 Onepoint IQ Limited | London
3. Real-timemonitoring &
analytics
4. Near real-time
analytics
1. Dataintegration
hub
5. Investigativecomputing
2. Analyticsaccelerator
6. Newmarketsenabler
These clusters are not mutually exclusive. A given client use case is likely to be a combination of theseclusters. For example, a data integration hub (1) may be a pre-requisite to investigative computing (5).
Big Data – A Brief HistoryBig Data technologies have developed over time to meet increasing need to process largedata volumes
© 2013 Onepoint IQ Limited | London 14
Open source webcrawler projectcreated by Doug
Cutting
Publishes MapReduceand GFS Paper
Open SourceMapReduce and HDFS
project created byDoug Cutting
Runs 4,000-nodeHadoop cluster
Hadoop winsTerabyte sortbenchmark
Launches SQL supportfor Hadoop
Other HadoopDistributions
Cloudera MapR /Hortonwworks
2002 2007 2012
Growth of Hadoop - Examples
15© 2013 Onepoint IQ Limited | London
Source : http://hadoopblog.blogspot.co.uk/2010/05/facebook-has-worlds-largest-hadoop.html
Source: http://nosql.mypopescu.com/post/42441081929/hadoop-at-yahoo-2013-update
2009: 10 & 28 node clusters2010: Hundreds node cluster / multi PB2011: Thousands node cluster(s) / 10s PB
Source: eBay’s Hadoop Stack: Evolution and Revolution, Juhan Lee, ebay
“There’s this wonderful technology at Google. I would love to be able to use it but I can’t because Idon’t work at Google. There are probably a lot of other people who feel that same way, and opensource is a great way to get technology to everyone.”.
I’ve always loved open source because it’s such a tremendous lever.
What I look for is a way to find the smallest thing I can do, with the least amount of work thatwill have the most impact. Where is the leverage point?Hadoop came out of that. We needed to do some vast computing, but I also saw a lot of otherworkloads that could benefit from this.
-Doug Cutting, Director Apache Software Foundation (2006)
Source : Forbes http://www.forbes.com/sites/netapp/2013/01/16/big-data-hadoop-doug-cutting/
© 2013 Onepoint IQ Limited | London 16
Big Data AdvancesReal value comes from taking advantage of these advances to develop data products andservices.
17© 2013 Onepoint IQ Limited | London
Datamanagement
advances
Analyticsadvances
Big, fast data Big, better analytics
Data Services &Apps
Real business value
Emerging Big Data Reference Stack
18
• As the foundational layer in the bigdata stack, the cloud provides thescalable persistence and computepower needed to manufacture dataproducts.
• At the middle layer of the big datastack is analytics, where features areextracted from data, and fed intoclassification and predictionalgorithms.
• Finally, at the top of the stack areservices and applications. This is thelevel at which consumers experience adata product, whether it be a musicrecommendation or a traffic routeprediction.
Source: O’Reilly Strata
Big, fastdata
Big, betteranalytics
Data Services& Apps
© 2013 Onepoint IQ Limited | London
Core Hadoop ArchitectureHighly scalable, distributed, fault tolerant Architecture running on off the shelfhardware; two core components –HDFS&MapReduce
19© 2013 Onepoint IQ Limited | London
Hadoop within existing Enterprise ArchitectureWe seeHadoop as complimentary to current Enterprise Architecture to extract valuefrom newer data sources both structured and unstructured
20© 2013 Onepoint IQ Limited | London
Dat
a An
alyt
ics
& A
pplic
atio
nsD
ata
Stor
age
& M
anag
emen
t
A Simplified Comparison
21
The ‘Old’ way: Store some of the data. Process and analyze some of the
data. Setup specific schemas and queries. Huge effort when schemas have to
change.
The ‘Big Data’ way: Store all the data you want. Process and Analyze all Your Data. Ask new Questions for further analysis. Ask more Questions Get Answers faster Get clearer Insight Make better business decisions
CRM
Normalized DataODS or
TraditionalData
Warehouse
Billing
Finance
Data ETLData QualityData
© 2013 Onepoint IQ Limited | London
Evolving Big Data Landscape
© 2013 Onepoint IQ Limited | London 22
Source: Bloomberg Ventures (Matt Truck @matttruck & Shivon Zillis @shivonz)
Practical experience with:
Hadoop infrastructure
Big data management
NoSQL & other databases
Big data search
Analytic discovery
Analytic visualisation
Cloud platforms
Systems integrators
Business consultancies
Sector & point specialists
EcosystemOurpractical experience with an ever-evolving ecosystem andour vendor independencehelp us to quickly define atailored solution that is ‘right’ for our clients.
23© 2013 Onepoint IQ Limited | London
Shark
(This is a highly simplified characterisation of the landscape, as solution stacks often cut across multiple categories.)
Introduction
Big Data & Apache Hadoop
MapR / Drill Demo
Summary & Further Learning
24© 2013 Onepoint IQ Limited | London
Introduction
Big Data & Apache Hadoop
MapR / Drill Demo
Summary & Further Learning
25© 2013 Onepoint IQ Limited | London
Summary / Wrap Up
26© 2013 Onepoint IQ Limited | London
• There is more to Big Data than the hype• Many of the advances are powered by Open Source• The ecosystem around Big Data is evolving rapidly• Most organisations can experimentation with Big (and small) Data• MapR have built a unique offering on top of Open Source
Further LearningLearning can take many shapes, depending on organisational and individual needs.
27
EcosystemWorkshops
1-2 days
AdministratorWorkshops
3-5 days
TechnicalProgramme
BusinessProgramme
ExecutiveProgramme
TechnicalBriefing1-2 hours
BusinessBriefing45-90 mins
IdeationWorkshop
2-3 hours
DiscoveryWorkshop
½-1 day
ExecutiveBriefing45-90 mins
‘CXO’Workshop
3-4 hours
DeveloperWorkshops
3-5 days
© 2013 Onepoint IQ Limited | London
Cheatsheet & Mythbuster
28© 2013 Onepoint IQ Limited | London
Thank you
29© 2013 Onepoint IQ Limited | London
Shashin Shah | Technology Zen Master | Founder Director | [email protected] Kulasingam | Chief Stratnologist | Founder Director | [email protected] (Sasha) Polev | (The Data Professor) Chief Architect | [email protected] Hausenblas | Chief Data Engineer EMEA at MapR Technologies | [email protected]