57
Scott Hoover Operationalizing Analytics To Scale

Operationalizing analytics to scale

  • Upload
    looker

  • View
    492

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Operationalizing analytics to scale

Scott HooverOperationalizing Analytics To Scale

Page 2: Operationalizing analytics to scale

Operationalizing Analytics To Scale

Many companies have invested time and money into building sophisticated data pipelines that can move massive amounts of data in (near) real time. However, for the analyst or data scientist who builds models offline, integrating their analyses into these pipelines for operational purposes can pose a challenge.

In this workshop, we will discuss some key technologies and workflows companies can leverage to build end-to-end solutions for automating analytical, statistical and machine learning solutions: from collection and storage to analysis and real-time predictions.

Abstract

Page 3: Operationalizing analytics to scale

Agenda

Page 4: Operationalizing analytics to scale

● Introduction

Agenda

Page 5: Operationalizing analytics to scale

● Introduction● What Are we Talking About Exactly?

Agenda

Page 6: Operationalizing analytics to scale

● Introduction● What Are we Talking About Exactly?● The Problem at Hand

Agenda

Page 7: Operationalizing analytics to scale

● Introduction● What Are we Talking About Exactly?● The Problem at Hand● Operationalizing Analytics

Agenda

Page 8: Operationalizing analytics to scale

● Introduction● What Are we Talking About Exactly?● The Problem at Hand● Operationalizing Analytics● Operationalizing Predictive Analytics

Agenda

Page 9: Operationalizing analytics to scale

● Introduction● What Are we Talking About Exactly?● The Problem at Hand● Operationalizing Analytics● Operationalizing Predictive Analytics● Questions

Agenda

Page 10: Operationalizing analytics to scale

Introduction

● I work on the Internal Data team at Looker.

Page 11: Operationalizing analytics to scale

Introduction

● I work on the Internal Data team at Looker.

● Before Looker, I worked in consulting and research.

Page 12: Operationalizing analytics to scale

Introduction

● I work on the Internal Data team at Looker.

● Before Looker, I worked in consulting and research.

● Looker is a business intelligence tool.

Page 13: Operationalizing analytics to scale

What are we talking about?

● What do I mean when I say “operationalizing”?

Page 14: Operationalizing analytics to scale

What are we talking about?

● What do I mean when I say “operationalizing”?

● Why is this important?

Page 15: Operationalizing analytics to scale

The Problem at Hand

● Analysts are providing basic reports for the entire business.

Page 16: Operationalizing analytics to scale

● Analysts are providing basic reports for the entire business.

● Analysts and Data Scientists are building offline models.

The Problem at Hand

Page 17: Operationalizing analytics to scale

The Problem With Offline Models

● Offline analyses aren’t associated with particularly quick turnaround times.

Page 18: Operationalizing analytics to scale

The Problem With Offline Models

● Offline analyses aren’t associated with particularly quick turnaround times.

● Offline analyses aren’t particularly collaborative.

Page 19: Operationalizing analytics to scale

The Problem With Offline Models

● Offline analyses aren’t associated with particularly quick turnaround times.

● Offline analyses aren’t particularly collaborative.

● Offline analyses aren’t particularly portable.

Page 20: Operationalizing analytics to scale

A Potential Set-up (Straw Man)

Data Sources

http

Data Stores

query

Analysis Consumption

Page 21: Operationalizing analytics to scale

Operationalizing Analytics - The Simple Case

Page 22: Operationalizing analytics to scale

Operationalizing Analytics - The Simple Case

● These metrics are vanilla.

Page 23: Operationalizing analytics to scale

● These metrics are vanilla.

● These metrics are critical.

Operationalizing Analytics - The Simple Case

Page 24: Operationalizing analytics to scale

● These metrics are vanilla.

● These metrics are critical.

● The business would probably better served if Data Scientists and Analysts were spending their time answering questions that require deep technical knowledge.

Operationalizing Analytics - The Simple Case

Page 25: Operationalizing analytics to scale

● Build or buy a workhorse ETL tool.

Operationalizing Analytics - A How To

Page 26: Operationalizing analytics to scale

● Build or buy a workhorse ETL tool.

● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.”

Operationalizing Analytics - A How To

Page 27: Operationalizing analytics to scale

● Build or buy a workhorse ETL tool.

● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.”

● Emphasize self-service wherever possible.

Operationalizing Analytics - A How To

Page 28: Operationalizing analytics to scale

● Build or buy a workhorse ETL tool.

● Move toward an Operational Data Store (ODS), reducing the need for postprocessing and data “mashups.”

● Emphasize self-service wherever possible.

● Analytics should slot into existing the infrastructure with minimal friction.

Operationalizing Analytics - A How To

Page 29: Operationalizing analytics to scale

Operationalizing Predictive Analytics

Page 30: Operationalizing analytics to scale

Where to Begin

● Out-of-the-box tools.

Page 31: Operationalizing analytics to scale

● Out-of-the-box tools.

● Build from scratch.

Where to Begin

Page 32: Operationalizing analytics to scale

● Out-of-the-box tools.

● Build from scratch.

● A mean between extremes.

Where to Begin

Page 33: Operationalizing analytics to scale

● XML-based, model-storage format.

A Model Standard - PMML

Page 34: Operationalizing analytics to scale

● XML-based, model-storage format.

● Created and maintained by the Data Mining Group.

A Model Standard - PMML

Page 35: Operationalizing analytics to scale

● XML-based, model-storage format.

● Created and maintained by the Data Mining Group.

● Most commonly used statistical/machine learning models are supported.

A Model Standard - PMML

Page 36: Operationalizing analytics to scale

PMML IntegrationsProducers Consumers

Page 37: Operationalizing analytics to scale

JPMML● JPMML is an open-source API for evaluating PMML files.

Page 38: Operationalizing analytics to scale

JPMML● JPMML is an open-source API for evaluating PMML files.

● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions.

Page 39: Operationalizing analytics to scale

JPMML● JPMML is an open-source API for evaluating PMML files.

● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions.

● Openscoring.io distributes various JPPML APIs and UDFs—for example, RESTful API, Heroku, Hive, Pig, Cascading and PostgreSQL.

Page 40: Operationalizing analytics to scale

JPMML● JPMML is an open-source API for evaluating PMML files.

● In essence, we equip the JPMML application with our PMML file, serve it up with new data, and it provides us with predictions.

● Openscoring.io distributes various JPPML APIs and UDFs—for example, RESTful API, Heroku, Hive, Pig, Cascading and PostgreSQL.

● All we have to do is write some code that fetches new values, serves them up to the JPMML API, captures the predictions, then pushes them back to a database.

Page 41: Operationalizing analytics to scale

Example Architecture - Lead Scoring

API

API

GET lead

UPDATE lead

GET leads

Page 42: Operationalizing analytics to scale
Page 43: Operationalizing analytics to scale
Page 44: Operationalizing analytics to scale
Page 45: Operationalizing analytics to scale

Heroku: git push heroku master

REST: curl -X PUT --data-binary @BayesLeadScore.pmml -H "Content-type: text/xml" http://ec2_endpoint/openscoring/model/BayesLeadScore

Deploy Model - PUT /model/${id}

Page 46: Operationalizing analytics to scale

CURLing or navigating to http://heroku_endpoint/openscoring/model/BayesLeadScore or http://ec2_endpoint/openscoring/model/BayesLeadScore will display our pmml model.

View Model - GET /model/${id}

Page 47: Operationalizing analytics to scale

Test Model - POST /model/${id}

newLead.json

curl -X POST \--data-binary @newLead.json \-H "Content-type: application/json" \http://ec2_endpoint/openscoring/model/BayesLeadScore

Send request to JPMML API{“id” : “001”,

“arguments” : {“country” :

“US”,“budget” :

7.8}}

Page 48: Operationalizing analytics to scale

Example Response

{“id” : “001”,

“result” : {“meeting” : “1”,“Probability_0” :

0.33062906130485653,“Probability_1” : 0.6693709386951435}}

Page 49: Operationalizing analytics to scale

Batch Request - POST /model/${id}/batchbatchLeads.json

curl -X POST --data-binary \@batchLeads.json -H "Content-type: \ application/json" \ http://ec2_endpoint/openscoring/model/BayesLeadScore/batch

Send request to JPMML API{ "id":"batch-1", "requests":[ { "id":"001", "arguments":{ "country":"US", "budget":7.8 } }, { "id":"002", "arguments":{ "country":"CA", "budget":3.2 } } ]}

Page 50: Operationalizing analytics to scale

Scale Considerations

Page 51: Operationalizing analytics to scale

Scale Considerations● Horizontal scaling.

Page 52: Operationalizing analytics to scale

Scale Considerations● Horizontal scaling.

● Vertical scaling.

Page 53: Operationalizing analytics to scale

What About Truly Big Data?● For the rare few of us who need to make real-time predictions

against millions of rows per second, there’s a popular apache suite to handle this.

*image borrowed from OryxProject

Page 54: Operationalizing analytics to scale

Applications

ODS Analysis

APIs

Transactional DB / Event Storage

Business Intelligence

Scoring Server

Consumers Review / Versioning

Page 55: Operationalizing analytics to scale

Closing Thoughts

Page 56: Operationalizing analytics to scale

Questions

Page 57: Operationalizing analytics to scale

Learn more at looker.com/demo