22
All inquiries: Paolo Gaudiano Icosystem Corporation 10 Fawcett Street Cambridge, MA 02138 USA +1-617-520-1070 [email protected] Paolo Gaudiano, President & CTO Icosystem Corporation Prepared for: Boston Hadoop Users Group April 26, 2012

Paolo Gaudiano, President & CTO Icosystem Corporation … · 2012. 5. 2. · Paolo Gaudiano Icosystem Corporation 10 Fawcett Street Cambridge, MA 02138 USA +1-617-520-1070 [email protected]

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

  • All inquiries:

    Paolo Gaudiano Icosystem Corporation 10 Fawcett Street Cambridge, MA 02138 USA +1-617-520-1070 [email protected]

    Paolo Gaudiano, President & CTO Icosystem Corporation

    Prepared for: Boston Hadoop Users Group

    April 26, 2012

  • ©2000-2012 Icosystem Corp., all rights reserved 2

  • ©2000-2012 Icosystem Corp., all rights reserved 3

  • ©2000-2012 Icosystem Corp., all rights reserved 4

  • ©2000-2012 Icosystem Corp., all rights reserved 5

  • ©2000-2012 Icosystem Corp., all rights reserved 6

    Predictive modeling is the process of creating or selecting a model to predict the likelihood of an event

    or outcome.

    Why is this useful?

    Use predictive modeling to figure out what we are about to get into

    before we actually get into it.

  • Traditional predictive models are data-centric:

    •  Collect observations (Data)

    •  Identify mathematical relationships (Model)

    •  Extrapolate to new conditions (Prediction)

    ©2000-2012 Icosystem Corp., all rights reserved 7

    Data Model

    Prediction

  • •  Accurate fit to historical data does not guarantee predictive accuracy!

    •  Many types of data-centric models require the assumption of statistical stationarity

    In other words:

    Tomorrow will be just like yesterday!

    •  Only one case where there holds true:

    ©2000-2012 Icosystem Corp., all rights reserved 8

  • ©2000-2012 Icosystem Corp., all rights reserved 9

  • •  Simulate behavior of individuals (agents)

    •  Capture key elements of agents

    •  Simulate interactions between agents

    •  Let the simulation unfold over time

    •  Look for patterns and trends

    ©2000-2012 Icosystem Corp., all rights reserved 10

  • ©2000-2012 Icosystem Corp., all rights reserved

    The data-centric approach: •  Collect data from current situation •  Identify correlations with key

    variables (time of day, number of lanes, weather, ...)

    •  Extrapolate to new conditions

    The ABS approach: •  Simulate behavior of individual

    drivers (driving style, start/end points, response to weather...)

    •  Adjust behaviors until overall traffic patterns are replicated accurately

    •  Test results of changing conditions

    11 11

  • •  Most of us are experts when it comes to driving: accelerate, decelerate, change lanes

    •  This domain expertise is sufficient to build a surprisingly accurate simulation of traffic jams!

    ©2000-2012 Icosystem Corp., all rights reserved

    NetLogo simulation Real-world experiment

    12

  • •  Build the structure of the simulation using domain expertise (e.g., people accelerate, decelerate, change lanes)

    •  Run the simulation to determine what factors really matter (e.g., deceleration causes more jams than acceleration)

    •  Gather quantitative data to improve the accuracy of the simulation, e.g.: •  How hard to people decelerate? •  How often do they change lanes?

    ©2000-2012 Icosystem Corp., all rights reserved 13

  • ©2000-2012 Icosystem Corp., all rights reserved 14

  • ©2000-2012 Icosystem Corp., all rights reserved 15

  • Decision makers use simulations in two primary ways:

    What-if scenarios •  Use simulation to test the impact of

    assumptions, actions and external factors

    Strategy design •  Use simulations to identify strategies

    that will lead to success

    ©2000-2012 Icosystem Corp., all rights reserved 16

  • ©2000-2012 Icosystem Corp., all rights reserved

    Parking lot

    Produce

    Deli

    Registers Entrance

    Fro

    zen

    foo

    d

    CS

    D

    entrance

    target

    Client: PepsiCo - a Fortune 500 consumer goods company

    Challenge: Understand behavior of shoppers moving through a supermarket • How do they navigate? • What do they purchase? • Where to place products?

    Complexities: • Correlate cart tracking data with consumer behavior • Predict behavior for novel supermarket configurations

    17

  • ©2000-2012 Icosystem Corp., all rights reserved

    Client: W.K. Kellogg Foundation

    Challenge: • Identify non-traditional skills and experience to help disconnected youth find and retain entry-level positions while pursuing a successful career path. • Demonstrate value of non-traditional skills to employers

    Outcome: Developed simulation of employer “path” through entry-level position; identified quantitative metrics to maximize success.

    18

  • ©2000-2012 Icosystem Corp., all rights reserved

    Client: Leading semiconductors manufacturer

    Challenge: Allocate distributed computing resources across multiple data centers to minimize cost and maximize resource availability

    Outcome: Developed scenario-testing tool that integrates most aspects of high-level decision process while simulating low-level details of project flows, resource distribution and connectivity.

    19

  • ©2000-2012 Icosystem Corp., all rights reserved 20

  • •  Your knowledge and domain expertise are just as valuable as (or more than) quantitative data.

    •  Would you throw out a good data set that you had taken the time to collect?

    •  If not, then why would you accept data-centric models that ignore your knowledge?

    •  Agent-based simulations combine domain expertise and quantitative data, e.g.: •  Drivers change lanes when they can [domain expertise] •  Drivers change lanes every 3 minutes [quantitative data]

    Models that combine expertise and quantitative data will ALWAYS do better than those using only one or the other!

    ©2000-2012 Icosystem Corp., all rights reserved 21

  • ©2000-2012 Icosystem Corp., all rights reserved 22