Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
All inquiries:
Paolo Gaudiano Icosystem Corporation 10 Fawcett Street Cambridge, MA 02138 USA +1-617-520-1070 [email protected]
Paolo Gaudiano, President & CTO Icosystem Corporation
Prepared for: Boston Hadoop Users Group
April 26, 2012
©2000-2012 Icosystem Corp., all rights reserved 2
©2000-2012 Icosystem Corp., all rights reserved 3
©2000-2012 Icosystem Corp., all rights reserved 4
©2000-2012 Icosystem Corp., all rights reserved 5
©2000-2012 Icosystem Corp., all rights reserved 6
Predictive modeling is the process of creating or selecting a model to predict the likelihood of an event
or outcome.
Why is this useful?
Use predictive modeling to figure out what we are about to get into
before we actually get into it.
Traditional predictive models are data-centric:
• Collect observations (Data)
• Identify mathematical relationships (Model)
• Extrapolate to new conditions (Prediction)
©2000-2012 Icosystem Corp., all rights reserved 7
Data Model
Prediction
• Accurate fit to historical data does not guarantee predictive accuracy!
• Many types of data-centric models require the assumption of statistical stationarity
In other words:
Tomorrow will be just like yesterday!
• Only one case where there holds true:
©2000-2012 Icosystem Corp., all rights reserved 8
©2000-2012 Icosystem Corp., all rights reserved 9
• Simulate behavior of individuals (agents)
• Capture key elements of agents
• Simulate interactions between agents
• Let the simulation unfold over time
• Look for patterns and trends
©2000-2012 Icosystem Corp., all rights reserved 10
©2000-2012 Icosystem Corp., all rights reserved
The data-centric approach: • Collect data from current situation • Identify correlations with key
variables (time of day, number of lanes, weather, ...)
• Extrapolate to new conditions
The ABS approach: • Simulate behavior of individual
drivers (driving style, start/end points, response to weather...)
• Adjust behaviors until overall traffic patterns are replicated accurately
• Test results of changing conditions
11 11
• Most of us are experts when it comes to driving: accelerate, decelerate, change lanes
• This domain expertise is sufficient to build a surprisingly accurate simulation of traffic jams!
©2000-2012 Icosystem Corp., all rights reserved
NetLogo simulation Real-world experiment
12
• Build the structure of the simulation using domain expertise (e.g., people accelerate, decelerate, change lanes)
• Run the simulation to determine what factors really matter (e.g., deceleration causes more jams than acceleration)
• Gather quantitative data to improve the accuracy of the simulation, e.g.: • How hard to people decelerate? • How often do they change lanes?
©2000-2012 Icosystem Corp., all rights reserved 13
©2000-2012 Icosystem Corp., all rights reserved 14
©2000-2012 Icosystem Corp., all rights reserved 15
Decision makers use simulations in two primary ways:
What-if scenarios • Use simulation to test the impact of
assumptions, actions and external factors
Strategy design • Use simulations to identify strategies
that will lead to success
©2000-2012 Icosystem Corp., all rights reserved 16
©2000-2012 Icosystem Corp., all rights reserved
Parking lot
Produce
Deli
Registers Entrance
Fro
zen
foo
d
CS
D
entrance
target
Client: PepsiCo - a Fortune 500 consumer goods company
Challenge: Understand behavior of shoppers moving through a supermarket • How do they navigate? • What do they purchase? • Where to place products?
Complexities: • Correlate cart tracking data with consumer behavior • Predict behavior for novel supermarket configurations
17
©2000-2012 Icosystem Corp., all rights reserved
Client: W.K. Kellogg Foundation
Challenge: • Identify non-traditional skills and experience to help disconnected youth find and retain entry-level positions while pursuing a successful career path. • Demonstrate value of non-traditional skills to employers
Outcome: Developed simulation of employer “path” through entry-level position; identified quantitative metrics to maximize success.
18
©2000-2012 Icosystem Corp., all rights reserved
Client: Leading semiconductors manufacturer
Challenge: Allocate distributed computing resources across multiple data centers to minimize cost and maximize resource availability
Outcome: Developed scenario-testing tool that integrates most aspects of high-level decision process while simulating low-level details of project flows, resource distribution and connectivity.
19
©2000-2012 Icosystem Corp., all rights reserved 20
• Your knowledge and domain expertise are just as valuable as (or more than) quantitative data.
• Would you throw out a good data set that you had taken the time to collect?
• If not, then why would you accept data-centric models that ignore your knowledge?
• Agent-based simulations combine domain expertise and quantitative data, e.g.: • Drivers change lanes when they can [domain expertise] • Drivers change lanes every 3 minutes [quantitative data]
Models that combine expertise and quantitative data will ALWAYS do better than those using only one or the other!
©2000-2012 Icosystem Corp., all rights reserved 21
©2000-2012 Icosystem Corp., all rights reserved 22