14
Zementis © Alex Guazzelli, Ph.D. Director of Analytics - Zementis, Inc. Forum on Analytics November 12, 2008 Easy Expression and Execution of Data Mining Models through PMML

Easy Expression and Execution of Data Mining Models through PMML

  • Upload
    tommy96

  • View
    826

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Easy Expression and Execution of Data Mining Models through PMML

Zementis ©

Alex Guazzelli, Ph.D.Director of Analytics - Zementis, Inc.

Forum on AnalyticsNovember 12, 2008

Easy Expression and Execution of Data Mining Models through PMML

Page 2: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 2

DeploymentDevelopmentPMML allows for easy

expression and deployment of data

transformations and data-mining models

OpenStandards

R allows for reliable data manipulation and model building

Real-time execution of models via web-services calls

Development, Deployment, and Executionof Predictive Models

Execution

Page 3: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 3

The R ProjectThe R Project

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R provides a wide variety of statistical techniques and is highly extensible.

R is similar to the S language and environment developed at Bell Labs.

It is Open Source and a GNU project.

R is available for free at http://www.r-project.org/

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R provides a wide variety of statistical techniques and is highly extensible.

R is similar to the S language and environment developed at Bell Labs.

It is Open Source and a GNU project.

R is available for free at http://www.r-project.org/

R

Model Development

Page 4: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 4

Predictive Model Markup Language (PMML)Predictive Model Markup Language (PMML)

PMML

Deployment

PMML is an XML-based language toDefine statistical and data mining modelsShare models between compliant applications

Standard for exchange of models toAvoid proprietary issues and incompatibilitiesDeploy models in operational infrastructure

Clear separation of tasksModel development vs. model deploymentScientists focus on building the best modelEliminates need for custom model deploymentEnsures scalability and reliability

PMML is an XML-based language toDefine statistical and data mining modelsShare models between compliant applications

Standard for exchange of models toAvoid proprietary issues and incompatibilitiesDeploy models in operational infrastructure

Clear separation of tasksModel development vs. model deploymentScientists focus on building the best modelEliminates need for custom model deploymentEnsures scalability and reliability

Page 5: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 5

Matured and Supported by IndustryMatured and Supported by Industry

PMML

PMML Industry Support

Data Mining Group http://www.dmg.orgMature standard

Current version 3.2Active group and constant enhancements

Vendor independent consortiumIndustry supporters

Major Players: IBM, Oracle, SAP, MicrosoftAnalytics: SAS, SPSS, Fair Isaac, ZementisBusiness Intelligence: MicroStrategy, TeradataOpen Source: R

Data Mining Group http://www.dmg.orgMature standard

Current version 3.2Active group and constant enhancements

Vendor independent consortiumIndustry supporters

Major Players: IBM, Oracle, SAP, MicrosoftAnalytics: SAS, SPSS, Fair Isaac, ZementisBusiness Intelligence: MicroStrategy, TeradataOpen Source: R

Page 6: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 6

Models

PMML defines a standard not only to represent data-mining models, but also data handlingand data transformations(pre- and post-processing)

Data Transformations and Data-Mining Models come together in PMML.

Predictive Model Markup LanguagePredictive Model Markup Language

A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).

Several Data Transformationsstrategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).

A comprehensive list of Data-Mining Models offers power and flexibility.

Post-processing of results allow for tailored decisions

A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).

Several Data Transformationsstrategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).

A comprehensive list of Data-Mining Models offers power and flexibility.

Post-processing of results allow for tailored decisions

PMMLBringing data and Models Together

Transformations

Page 7: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 7

Page 8: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 8

Page 9: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 9

Data Analysis

Statistical Model

PMML Export

Got Models…

What Now?

Page 10: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 10

Predictive Analytics Scoring EnginePredictive Analytics Scoring Engine

Data transformations and model execution in real-time (via web-services calls) or batch-mode.

Environment to manage and deploy many predictive models or rule sets.

Framework for SOA-based IT integration

Completely standards based and easily integrated with any existing infrastructure.

Not a model building environment.

Data transformations and model execution in real-time (via web-services calls) or batch-mode.

Environment to manage and deploy many predictive models or rule sets.

Framework for SOA-based IT integration

Completely standards based and easily integrated with any existing infrastructure.

Not a model building environment.

ADAPA

ExecutionThe ADAPA Example

Page 11: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 11

Page 12: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 12

Page 13: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 13

1 through 6 – From Raw Data to Smart Decisions

1 Data Extraction and AnalysisModel BuildingPMML ExportPMML ImportWeb-Service CallsModel Execution

23456

Page 14: Easy Expression and Execution of Data Mining Models through PMML

Zementis © 14

Thank You!

U.S.A Asia

E-mail: [email protected]

19/F., Unit AHo Lee Commercial Building38-44 D’Aguilar StreetCentral, Hong Kong (S.A.R.)

Tel: +852 2868-0878Fax: +852 2845-6027

6125 Cornerstone Court EastSuite 250San Diego, CA, 92121

Tel: +1 619 330-0780Fax: +1 858 535-0227