Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
#mstrworld
How to Build MicroStrategy Projects on Top of Big Data Sources in the Cloud
Jochen Demuth, Director, Partner Engineering
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business
and consumer studies, Surveys, Polls
All business performance drivers – Operational
efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
Use Cases for Big Data in the Cloud
Four broad categories and their value
#mstrworld
Traditional sources moving online
How to take advantage of new technologies
3
Traditional relational data sources in the cloud• RDBMS installed in the cloud (e.g. HP Vertica on Amazon EC2)• Managed RDBMS in the cloud (e.g. Amazon RDS)
Relational Database technology build for the cloud, e.g.• Amazon AWS (EMR, Redshift, Aurora)• Google BigQuery• RDBMS vendor cloud services (e.g. Microsoft, Oracle, Teradata, HP, IBM,
SAP, …)
� Cloud services simplify and automate many aspects of data management, but there are still application specific aspects that need conscious control
#mstrworld
Some Database Features Require Conscious Design Choices
Query time often dominated by data access with significant performance impact
4
Data organization
• Columnar vs. row based
Minimize data access
• Partitioning key selection
• Data sorting
• (Index selection/strategy)
• Compression (on/off; algorithm)
• Approximate calculation (e.g. HyperLogLog)
Access and process data in parallel
• Data distribution in MPP databases to minimize data movement
� Existing best practices for developing MicroStrategy applications apply
� Make sure to take advantage of db features designed for analytical workloads
� Look for best practices to take advantage of data source strengths in MicroStrategy Community
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business
and consumer studies, Surveys, Polls
All business performance drivers – Operational
efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
Use Cases for Big Data in the Cloud
Four broad categories and their value
#mstrworld
Identifying Value in Data Requires Utmost Flexibility
Static data models get in the way of analysis at the speed of thought
6
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Technical Characteristics:• Unknown data sources are analyzed for
potential new business value.• Analysis necessary to support the
development of new business models• Data models don’t exist (yet).
#mstrworld
Analy
tical C
om
ple
xity
User S
cale
• Trained in modeling and coding
• Use a variety of tools
• Want their favorite tools
• Look for the truth
• Analytical amateurs
• Power users of BI tools
• Want to use the right tool
• Look for the business edge
• Make the daily decisions
• Some may be power users
• Most need simple tools
• Look for actionable information
Data Scientists Business Analysts Business Users
Back Office Front Line
MicroStrategy Supports All Analytic Needs
Some People Produce Analytics While Others Consume Analytics
#mstrworld
Choose how to access and analyze data
MicroStrategy Provides Flexible Data Modeling Options
Direct
Unified MicroStrategy Metadata
• Reusable Data
• Reusable Objects
• Reusable Design
Report
Modeled
Visual InsightDashboard
ID scansOnline click-
streamApplication logs
Call/service records
Report Visual InsightDashboard
Flexible data access
• Schema on read
• Supports quick iterations
• Reusable Objects
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business
and consumer studies, Surveys, Polls
All business performance drivers – Operational
efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
Use Cases for Big Data in the Cloud
Four broad categories and their value
#mstrworld
The Web 2.0 Phenomenon Introduces Specific Challenges
Data access, data structure, and data meshing
10
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Access data where it exists• Web 2.0 data stored in relational data sources • Online services that also provide data services
• E.g. Salesforce.com• Online services that provide data
• Social• Government• Weather
� MicroStrategy offers three ways to access Web 2.0 data
� Data often requires structuring or flattening for analysis
� For optimal value data from multiple sources need to be put in context
#mstrworld
User / Departmental Data
Data Warehouse Appliances
Big Data & NoSQL
Relational Databases
MultidimensionalDatabases
ColumnarDatabases
SaaS-Based App Data
HANA
BigInsights
Parallel Data Warehouse
Elastic Map Reduce
Analysis Services
Redshift
Brin
g A
ll R
ele
van
t D
ata
to
D
ecis
ion
Ma
ke
rs
Distribution
No Data Left Behind
Optimized connectors to your entire Big Data ecosystem
#mstrworld
DA
TA
PR
OC
ES
SIN
G,
AN
ALY
TIC
S &
DE
LIV
ER
Y
Dashboards Reports and StatementsSelf-Service Analytics OLAP Analysis
MicroStrategy Analytics Platform
1. Direct connection to source
• Parse structure with lightweight “Schema-on-read” functions
• Import data or Create a modeled environment
2. Using Web Services
• Requires data to be exposed as a Web Service
• Data will need to be structured prior to access
3. Offline “Process and Store”
• Using specialty analytics (text, streaming, image processing) and stored as structured
• Text Analytics Module
Semi-Structured Data Unstructured Data
DA
TA
S
TO
RA
GE
Web Logs Social media posts
Surveys Server Logs Geo-spatial
E-mail Image Audio Video
Sensor + Machine Data Documents
Three Ways to Query Multi-structured Data
#mstrworld
MicroStrategy Offers Several Paths to Mesh Data For Analysis
Integrating Modeled BI and Self-Service BI
Multi-Source Pushdown Joins
Structured BI Content Consumption
Structured Data:Architect
Structured Join: Multi-Source Model
Corporate Data Sources
Dashboards and MicroApps
Cubes from Model
Ad Hoc / Visual Insight
Join Datasets in Documents
Self Service BI Content Creation
Self Service Data:Data Import
Self Service Join:Document Data
Blending
Local / Dept Data SourcesCubes from Import
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business
and consumer studies, Surveys, Polls
All business performance drivers – Operational
efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
Use Cases for Big Data in the Cloud
Four broad categories and their value
#mstrworld
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
Find Insights in Vast Amounts of Machine Generated Data
Machine generated data often does not lend itself for traditional OLAP analysis
Apply the methods of predictive analytics and data mining to machine generated data
#mstrworld
Primary Work Horses of
Data Mining
“Which Techniques Do You Use Most”
�= MicroStrategy Native
� = via PMML
= via R
���
��
�
���
��
�
�
����
��
Source: 2013 Rexer Data Miner Surveyswww.RexerAnalytics.com
Over 1,250 Data Miners from 75 Countries
MicroStrategy Support for Predictive Analytics
All of the most commonly used techniques are supported
#mstrworld
Predictive Analytics Are Part of MicroStrategy Function Library
17
AverageMeanCountSumMaximumMinimumMedianModeProductRank Percentile“N”-TileN-tile by StepN-tile by ValueN-tile by Step and Value
ReportingAdd DaysAdd MonthsCurrent DateCurrent Date & TimeCurrent TimeDay of MonthDay of WeekDay of YearDays BetweenMonth Start DateMonth End DateMonths BetweenYear Start DateYear End Date
Date and Time
Standard DeviationStandard Deviation of a PopulationVarianceVariance of a Population
Geometric MeanAverage DeviationKurtosisSkew
Statistical Aggregate
Running TotalRunning Std DeviationRunning Std Deviation of PopulationRunning MinimumRunning MaximumRunning CountMoving DifferenceMoving MaximumMoving MinimumMoving Average
Moving SumMoving CountMoving Std DeviationMoving Std Deviation of PopulationFirst or Last Value in RangeExponential Weight Moving AvgExponential Weight Running Avg
OLAP Functions
Beta DistributionBeta InverseBinomial Distribution ProbabilityChi DistributionChi InverseConfidenceCorrelation CoefficientCovarianceCritical Binomial DistributionChi Test (Independence)Cumulative Binomial DistributionExponent DistributionF-Probability DistributionF-TestFisher Transformation Gamma DistributionGamma InverseGamma LogarithmHomoscedastic Ttest
Heteroscedastic TtestHypergeometricDistributionIntercept PointInverse of Lognormal Cumulative DistributionInverse of F Probability DistributionInverse of FisherInverse of the StdNormal Cumulative DistributionInverse of the T-DistributionLognormal Cumulative DistributionMean T-TestNegative Binomial DistributionNormal Cumulative DistributionNormal Distribution InverseNumber of
Permutations for a Given ObjectPaired T-testPoisson Distribution (Predict Number of Events)Pearson Product Moment Correlation CoefficientRSQ (Square of Pearson)Slope of Linear Regression STEYX (Standard Error of Predicted “y” Value)StandardizeStandard Normal Cumulative DistributionT-DistributionVariance TestWeibull Distribution (Reliability Analysis)
Statistical
Accrued InterestAccrued Interest MaturityAmount Received at MaturityBond-equivalent Yield for T-BILLConvert Dollar Price from Fraction to DecimalConvert Dollar Price from Decimal to FractionCumulative Interest Paid on Loan Cumulative Principal Paid on LoanDepreciation for each Accounting PeriodDays In Coupon Period to Settlement DateDays In Coupon Period with Settlement DateDays from Settlement Date to Next CouponDouble-Declining Balance MethodDiscount Rate For a SecurityEffective Annual Interest RateFixed-Declining Balance MethodFuture ValueFuture Value of Initial Principal with Compound
Interest RatesInterest RateInterest PaymentInternal Rate of ReturnInterest Rate per AnnuityMacauley DurationModified DurationModified Internal Rate of ReturnNext Coupon Date After Settlement DateNo of Coupons Settlement and Maturity DateNominal Annual Interest RateNo of Investment PeriodsNet Present ValueOdd First period YieldOdd Last PeriodPrev Coupon Date Before Settlement DatePrice Per $100 Face Value w OddFirst Period Payment
Payment on PrincipalPricePrice DiscountPrice at MaturityPresent ValueProrated Depreciation for each Period Straight Line DepreciationSum-Of-Years' Digits DepreciationT-BILL PriceT-BILL YieldVariable Declining BalanceYieldYield for Discounted SecurityYield at Maturity
FinancialAbsolute IntegerA-cosine LnHyp A-cos LogA-sine Log10Hyp A-sine ModA-tan PowerA-tan2 QuotientHyp A-tanRadiansCeiling RandbetweenCombine RoundCosine SineHyp Cosine Hyp SineDegrees Square RootExponent TanFactorial Hyp TanFloor Truncate
Math Functions
Association RulesClusteringGeneral RegressionMiningNeural NetworkRegressionRule SetSupport Vector Machine
Time SeriesTrain AssociationTrain ClusteringTrain Decision TreeTrain RegressionTrain Time SeriesTree ModelVariants
Data Mining
#mstrworld
Deploy Any of 5000+ Open Source R
Analytics
As a MicroStrategy metric, use models and
functions in any report or dashboard
MicroStrategy R
Integration Pack
Create Your Own Custom Functions
MicroStrategy Custom
Function Plug-in
Import Predictive Models from Popular
Packages
PMML Model
ƒApply(X)
Easy Integration with Third Party Analytical Models
#mstrworld
Industry’s most powerful SQL Engine and 300+ native analytical functions
Predictions
Relationship Analysis
Benchmarking
Trend Analysis
Data Summarization
An
aly
tic
al
Ma
turi
ty
What is likely to happen based on past history?
What factors influence activity or behavior?
How are we doing versus comparables?
What direction are we headed in?
What is happening in the aggregate?
Optimization What do we want to happen?
World’s most popular
advanced analytics tool.
Free, open source.
More
Specialty Tools
The Full Range of Advanced Analytics from One Place
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business
and consumer studies, Surveys, Polls
All business performance drivers – Operational
efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service
records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk
management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts,
tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand
management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to
machine communication
Operational efficiency, Cost control, Risk
avoidance
SOURCE
VALUE
MicroStrategy Supports All Use Cases for Big Data in the Cloud
Analytical platform that provides the flexibility to enable modern analysis