View
218
Download
0
Category
Preview:
Citation preview
© Copyright 2000-2016 TIBCO Software Inc.
• What is Machine Learning?
• Decision Tree Models
• Customer Analytics Examples
• Manufacturing Examples
• Fraud Use Examples
• Machine Learning on the TIBCO Community
Agenda
Machine Learning
Machine learning is a method of data analysis that automates analytical
model building. Using algorithms that iteratively learn from data, machine
learning allows computers to find hidden insights without being explicitly
programmed where to look.
Machine Learning
Machine learning is a method of data analysis that automates analytical
model building. Using algorithms that iteratively learn from data, machine
learning allows computers to find hidden insights without being explicitly
programmed where to look.
Enabled by exponentially
increasing compute power –
doubling every 2 years
6
Why use machine learning algorithms?
• Good Results• Machine learning algorithms + Big Data sets can
produce models that accurately fit complex data patterns.
• Can make predictions for complex processes & systems• Can handle systems with hundreds or thousands of
variables
• Easy to use / Simple user interface• Computer algorithm does the heavy lifting • Results presented with easy-to-understand visualizations
© Copyright 2000-2016 TIBCO Software Inc.
© Copyright 2000-2016 TIBCO Software Inc.
• Supervised – Solve known problems
• Build a model that predicts something
• What factors are driving fraud or customer behavior or manufacturing defects?
• Decision Trees, Random Forest, Gradient Boosting Machine
• Unsupervised – Identify new patterns, Detect anomalies
• Are there new fraud clusters or buying patterns or failure modes emerging?
• Clustering, Principle Components, Neural Networks, Support Vector Machines
• Optimization – Support Decision-making
• Find best solution even when there are complex constraints
• What is the optimum route to take or allocation of resources or equipment maintenance schedule?
• Genetic Algorithm
Types of Machine Learning
© Copyright 2000-2016 TIBCO Software Inc.
• Customer Analytics - Prediction of customer behavior: customer
segmentation, customer churn, cross-sell/up-sell, propensity
• Fraud & Financial crime – Money laundering, credit card fraud,
medical fraud, insurance fraud
• Manufacturing - Optimization of manufacturing equipment,
processes and product yield
• Energy - Completions optimization, Blend optimization, Predictive
maintenance
• Transportation & Logistics - routing optimization, fuel efficiency,
predictive maintenance and warehouse distribution / space
optimization
Use Cases that leverage Machine Learning
© Copyright 2000-2016 TIBCO Software Inc.
Decision Tree – Titanic Survival Rate
family size
Wikipedia
© Copyright 2000-2016 TIBCO Software Inc.
Classical Statistics – Fit parameters to a well-defined model
Decision Tree – Product Pass / Fail by Process & Equipment
Bad Product
Good Product
Clearcoat Bake Temperature>= 132 C
Sanding Station1, 2, 4 3 Basecoat Thickness
Peeling Clearcoat
< 132 C
… … … …
Automobile Paint Process
© Copyright 2000-2016 TIBCO Software Inc.
Ensemble Tree Algorithms
• Random Forest, Gradient Boosting Machine (GBM)
• Method – Average many simple trees
• Sample the data: fit a simple tree
• Re-sample the data; up-weighting the observations that weren’t fitted well in
previous model
• Continue adding trees until fit is good
• Save all the trees and average them
• Better fit + prediction than single trees
Consumer Analytics
• Segmentation
• Propensity
• Affinity & Association
• Social: Sentiment & Intent
• Churn
• Loyalty
• Cross-sell / Up-sell
• Test & Learn (A|B testing)
• Online Analytics (Path,
Cart Abandonment, …)
Market Analytics
• Pricing
• Promotion
• Campaign Effectiveness
• Forecasting
• Market Mix
• Media Attribution
Customer
Acquisition
Customer
Retention
Relationship
Growth
Customer
Lifecycle
PoS, Panel
Loyalty Data
Market (Syndicated) Data
Store & Distribution Analytics
• Store Clustering; geospatial
modeling
• Store Performance
• Forecasting
• Effects: Price, Promotion
• Distribution: Pick, Pack, Ship
Store and DC Data
Customer & Marketing Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Customer Segmentation
Top Shopper
27% of customers &
35% of revenues
Broad purchase behavior
Budget Minded
34% of customers &
29% of revenues
Highly focused on
core building categories
Outdoor Plus
15% of customers &
16% of revenues
Mainly outdoor, but
other spending
Gardener
10% of customers &
5% of revenuesPrimarily garden
Seasonal Shopper
11% of customers &
12% of revenues
Very “event” oriented
Pool Customer
3% of customers &
4% of revenues
Very focused on pool and
patio categories
© Copyright 2000-2016 TIBCO Software Inc.
Objectives:
• Select most important “Response Products” to highlight in 2015
Holiday season direct marketing
• Identify and quantify predictive significance of “Driver Products”
based on historical data from 2014 sales
• Build campaigns for as many people as possible that are relevant
Propensity to Buy – Customer Success Story
Results
• Same year repeat visits are 3x higher for customers targeted in the
campaign
• Average order value is much higher
• Year over Year repeat visitors is double
Predicted Prob(attrition) = f (X, b)
• Y variable
• Attrition (Y/N over time period)
• X variables
• How long a member
• Website interactions - section
• Prior spend
• Time since last interaction
• Experian: demog, …
• f function
• Additive Model
• Random Forest, Gradient Boosting
Variable Names
Redacted
Attrition and Value Models
Real-Time Customer Interactions / Offers
No Login = No Customer History => Offer based on Product Association
Sarah Login = Sarah’s Customer History => Offer based on Propensity Model scored for Sarah
© Copyright 2000-2016 TIBCO Software Inc.
Correlate Product or Equipment Results to Process & Supplier Data
• Supplier - Incoming Materials and Components• measured electrical, chemical, physical characteristics
• batch-id, lot_id
• Manufacturing Process• Physical, chemical or electrical measurements
• WIP / MES: track-in / track-out date, process equipment id, recipe, operator, …
• Process equipment sensor data
• Equipment Maintenance logs
• Defect Inspections
• Cost of labor, materials, machines and facilities
• Product Quality and Reliability Test• Measured product functional and performance characteristics
• Accelerated life test results
• Product Field Returns• Failure mode, unit / batch / lot ID
• Failure analysis root cause results
• Warranty / Repair claim, call center and cost – structured & unstructured
• Problem
• Product & Equipment problems difficult to accurately diagnose for complex manufacturing processes
• Big Data problem – millions of units, hundreds / thousands of predictors
• Response: Product, Process or Equipment Fail data
• Predictors: in-process equipment, process and product measurements or attributes
• Value
• Being used by customers to find previously undetected problems. Reduces time-to-market and increases profit.
• Method
• GBM analysis template to identify significant predictors, interactions and nonlinearities
• For large datasets, hybrid data access used to perform variable reduction step in-DB
• Simple interface – easy for business analyst to run and interpret results
GBM results for semiconductor yield as a function of in-process equipment & product measurements
Machine Learning to Predict Equipment or Product Fails
Real-time Predictive Analytics for Process Cost reduction
Goal: Scrap parts as early as possible to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of sending it to Station 2?
Station 1 Station 2
Cost Before9€
7€ 13€Total Cost
29€(or more)
Scrap? Scrap?
Deploy real-time model: TIBCO Live Datamart & Streambase
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client
Step 1 – Catching New Fraud Like Old Fraud – Supervised Learning
Model to predict credit card fraud based on customer information: Variable Importance chart
Step 2 - Find unusual transactions - Unsupervised learning
Fradulent
Good
Algorithm examples:
• Principle Component Analysis
• Auto-encoder Neural Network
• Single-class Support Vector Machine
• Clustering (e.g. K-means, Hierarchical)
Step 3 – apply models in real-time with Streambase
Deploy models in real-time with a click from Spotfire
© Copyright 2000-2016 TIBCO Software Inc.
Learn & Do More: Machine Learning on the TIBCO Community
Wiki page
Component Exchange:• Data functions• Accelerators• Templates
https://community.tibco.com/wiki/machine-learning-tibco-spotfire-and-streambase
https://community.tibco.com/exchange/tags/machine-learning-12816
Recommended