View
5
Download
0
Category
Preview:
Citation preview
The Modern
Data Warehouse
1
Marc Schöni
Technology Solution Professional BI
Microsoft Switzerland
Karl-Heinz Sütterlin
Technology Solution Professional App Plat
Unlocking Insights on Any Data
Breakthrough Data Platform Performance with SQL Server 2014
Enabling Familiar, Powerful Business Intelligence
The Modern Data Warehouse
Big DataBig data solutions deal with complexities of:
VOLUME
(Size)
VARIETY
(Structure)
VELOCITY
(Speed)
VALUE
Industry TrendsHow the challenges are tackled technology-wise
VOLUME
(Size)
VARIETY
(Structure)
VELOCITY
(Speed)
MPPIn-
MemoryHadoop
VALUE EXCEL/PowerBI
13
MapReduce (Job Scheduling/Execution System)
HDFS (Hadoop Distributed File System)
HBase (Column DB)
Hive Mahout
Oozie
Sqoop
HBase/Cassandra/Couch/
MongoDB
Avro
Zo
okeep
er
Pig
Hadoop = MapReduce + HDFS
FlumeCascad-
ingR
Am
bari
HCatalog
[data + analytics + people] @ speedMicrosoft Analytics Platform System
Relational MPP
Database (PDW)
Hadoop
(HDInsight)
PolyBase
MPPPre-built and
performance-tuned
appliance for
analytical workload
In-
Memory100x speed
improvement
Scale-out to
petabytes of
data
Huge storage
savings with
columnstore
Hadoop
Dedicated region
for Hadoop
Joining relational
and non-relational
data with Polybase
Concurrency and mixed workloadsGreat Query Performance at Scale
Massively Parallel
Processing (MPP)
parallelizes queries
MPP query execution
Query
Results
Handles query
complexity and
concurrency at scale
Scaling out relational data to petabytesStart small and scale as you grow
16
Scale-out
PDW
0TB 6PB
PDW/
HDInsight
PDW/
HDInsight
PDW/
HDInsight
PDW/
HDInsight
PDW/
HDInsight
PDW/
HDInsight
Dedicated
CPU, memory
and storage
Incrementally
add hardware
for near-linear
scale
Integrates
HDInsight and
PDW
No “forklift” of
prior warehouse
to increase
capacity
Driver for In Memory
1
10
100
1000
10000
100000
1000000
10000000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Intel CPU trends and Memory prices($/GB)
Computing power holds Moore Law
(due to parallelism)
CPU clock frequency stalled
Memory has gotten a LOT cheaper
17
Up to 100x faster queries
Updatable clustered columnstore vs. table with customary indexing
Up to 15xmore compression
Integrate relational data and HadoopQuery relational + non relational relational data with PolyBase
Polybase
Analytics
Platform
System
Hortonworks
(Windows, Linux),
Cloudera
Microsoft Azure
HDInsight
Microsoft
HDInsight
Result set
PolyBase
Select…PDW and
HDInsight in a
single
appliance
Single query
model
Enterprise-ready
Hadoop
(Security,
Manageability
and HA)
Hybrid: spans
Hadoop on-
prem
or in the cloud
Analytics Platform System - EvolutionExtending the Data Warehouse further
SQL ServerControl Node
Compute Node
Compute Node
Compute Node
Compute Node
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
Analytics Platform SystemAnalytics Platform System
PDW HDInsight
Polybase
What did Coop do?
• Traditional SMP
• Traditional approach, 1 single server
• 32 physical cores
• 256GB Memory
• Shared SAN
• HA with Clustering
• Scale out (MPP) = APS/PDW
• Modern way of data warehousing
• 32 physical cores (active)
• 512GB Memory
• Direct attached storage
• HA built-in
• Connected to Hadoop
Modernized from SMP to MPP and get “Big Data” Ready
SQL Instance Storage
SQL Instance #2
SQL Instance #1
Storage
Storage
Customer Scoring Procedure (Example)60x or more performance improvement
More data, more
accurate results
From nightly
batch to
overday ad-
hoc/on going
scoring
Do the
impossible
Immediate
responses to ad-
hoc queries,
model
improvements
Need for faster analytics using more data from Point of Sales and Loyalty Programs
Business result
Improved supply chain and time to market
Optimized and better targeted marketing
Faster price adoptions –even on regional levels
Business need & result
Data platform
SQL Oracle SAP …
Hadoop
HDFS
APS with Polybase
Azure
Hdinsight
(Hadoop on
Azure)
Internal data External data
BI Apps BI Apps BI Apps
Collect data
Reduce data
Driving the ITDM’s decision:
Cost/performance profile
Driving the BDM’s decision:
Time to value
Another customer example
Excel & the Power BI tools
Huge SQL Server’s
Massive Tuning
& Partitioning Effort
Complex Event
Processing Software
.NET Know How
Hadoop without
Polybase
Massive Map Reduce &
Java/.NET Know How
VOLUME
(Size)
VARIETY
(Structure)VELOCITY
(Speed)
Analytics Platform
System
PDW Region
(MPP)
Analytics Platform
System
Clustered Column Store
Index (In Memory)
Analytics Platform
System
Hadoop Region
& Polybase
Past
Now
Forever
Recommended