Upload
tommy96
View
826
Download
2
Tags:
Embed Size (px)
Citation preview
Zementis ©
Alex Guazzelli, Ph.D.Director of Analytics - Zementis, Inc.
Forum on AnalyticsNovember 12, 2008
Easy Expression and Execution of Data Mining Models through PMML
Zementis © 2
DeploymentDevelopmentPMML allows for easy
expression and deployment of data
transformations and data-mining models
OpenStandards
R allows for reliable data manipulation and model building
Real-time execution of models via web-services calls
Development, Deployment, and Executionof Predictive Models
Execution
Zementis © 3
The R ProjectThe R Project
R is an integrated suite of software facilities for data manipulation, calculation and graphical display.
R provides a wide variety of statistical techniques and is highly extensible.
R is similar to the S language and environment developed at Bell Labs.
It is Open Source and a GNU project.
R is available for free at http://www.r-project.org/
R is an integrated suite of software facilities for data manipulation, calculation and graphical display.
R provides a wide variety of statistical techniques and is highly extensible.
R is similar to the S language and environment developed at Bell Labs.
It is Open Source and a GNU project.
R is available for free at http://www.r-project.org/
R
Model Development
Zementis © 4
Predictive Model Markup Language (PMML)Predictive Model Markup Language (PMML)
PMML
Deployment
PMML is an XML-based language toDefine statistical and data mining modelsShare models between compliant applications
Standard for exchange of models toAvoid proprietary issues and incompatibilitiesDeploy models in operational infrastructure
Clear separation of tasksModel development vs. model deploymentScientists focus on building the best modelEliminates need for custom model deploymentEnsures scalability and reliability
PMML is an XML-based language toDefine statistical and data mining modelsShare models between compliant applications
Standard for exchange of models toAvoid proprietary issues and incompatibilitiesDeploy models in operational infrastructure
Clear separation of tasksModel development vs. model deploymentScientists focus on building the best modelEliminates need for custom model deploymentEnsures scalability and reliability
Zementis © 5
Matured and Supported by IndustryMatured and Supported by Industry
PMML
PMML Industry Support
Data Mining Group http://www.dmg.orgMature standard
Current version 3.2Active group and constant enhancements
Vendor independent consortiumIndustry supporters
Major Players: IBM, Oracle, SAP, MicrosoftAnalytics: SAS, SPSS, Fair Isaac, ZementisBusiness Intelligence: MicroStrategy, TeradataOpen Source: R
Data Mining Group http://www.dmg.orgMature standard
Current version 3.2Active group and constant enhancements
Vendor independent consortiumIndustry supporters
Major Players: IBM, Oracle, SAP, MicrosoftAnalytics: SAS, SPSS, Fair Isaac, ZementisBusiness Intelligence: MicroStrategy, TeradataOpen Source: R
Zementis © 6
Models
PMML defines a standard not only to represent data-mining models, but also data handlingand data transformations(pre- and post-processing)
Data Transformations and Data-Mining Models come together in PMML.
Predictive Model Markup LanguagePredictive Model Markup Language
A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).
Several Data Transformationsstrategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).
A comprehensive list of Data-Mining Models offers power and flexibility.
Post-processing of results allow for tailored decisions
A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).
Several Data Transformationsstrategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).
A comprehensive list of Data-Mining Models offers power and flexibility.
Post-processing of results allow for tailored decisions
PMMLBringing data and Models Together
Transformations
Zementis © 7
Zementis © 8
Zementis © 9
Data Analysis
Statistical Model
PMML Export
Got Models…
What Now?
Zementis © 10
Predictive Analytics Scoring EnginePredictive Analytics Scoring Engine
Data transformations and model execution in real-time (via web-services calls) or batch-mode.
Environment to manage and deploy many predictive models or rule sets.
Framework for SOA-based IT integration
Completely standards based and easily integrated with any existing infrastructure.
Not a model building environment.
Data transformations and model execution in real-time (via web-services calls) or batch-mode.
Environment to manage and deploy many predictive models or rule sets.
Framework for SOA-based IT integration
Completely standards based and easily integrated with any existing infrastructure.
Not a model building environment.
ADAPA
ExecutionThe ADAPA Example
Zementis © 11
Zementis © 12
Zementis © 13
1 through 6 – From Raw Data to Smart Decisions
1 Data Extraction and AnalysisModel BuildingPMML ExportPMML ImportWeb-Service CallsModel Execution
23456
Zementis © 14
Thank You!
U.S.A Asia
E-mail: [email protected]
19/F., Unit AHo Lee Commercial Building38-44 D’Aguilar StreetCentral, Hong Kong (S.A.R.)
Tel: +852 2868-0878Fax: +852 2845-6027
6125 Cornerstone Court EastSuite 250San Diego, CA, 92121
Tel: +1 619 330-0780Fax: +1 858 535-0227