Open Source Machine LearningOpen Source Machine Learning
Open Source Probabilistic Network LibraryOpen Source Probabilistic Network Library
Gary BradskiGary BradskiProgram ManagerProgram Manager
Systems Technology Labs - IntelSystems Technology Labs - Intel
Intel® Confidential
2
Open Source ML
What are we announcing today?What are we announcing today? Intel is releasing a library of Open Source Intel is releasing a library of Open Source
Software for Machine LearningSoftware for Machine Learning First library is Probabilistic Network Library (PNL);First library is Probabilistic Network Library (PNL);
comprised of code for inference and learning using comprised of code for inference and learning using Bayesian NetworksBayesian Networks
Research and Development was conducted in Research and Development was conducted in Intel research labs in US, Russia and China Intel research labs in US, Russia and China
Software is released as part of Intel Open Software is released as part of Intel Open Research ProgramResearch Program Tool for research in many application areasTool for research in many application areas Open Source under a BSD licenseOpen Source under a BSD license
The code is free for academic and commercial useThe code is free for academic and commercial use More info: More info: http://www.intel.com/research/mrl/pnl http://www.intel.com/research/mrl/pnl
Intel® Confidential
3
Open Source ML
Why is Intel involved?Why is Intel involved?
Statistical Computing and Machine Learning Statistical Computing and Machine Learning can change computing applications in a can change computing applications in a considerable wayconsiderable way
Machine Learning requires high-powered Machine Learning requires high-powered processorsprocessors
Ties into Intel’s research in other areas such as Ties into Intel’s research in other areas such as wireless networking, sensor networks and wireless networking, sensor networks and Proactive HealthProactive Health
Intel® Confidential
4
Open Source ML
What is Machine Learning?What is Machine Learning?
Machine Learning allows computers to learn from their Machine Learning allows computers to learn from their experiences and from gathered dataexperiences and from gathered data
We’ve known for > 200 years that probability theory is We’ve known for > 200 years that probability theory is the right tool to model systems, but it has always been the right tool to model systems, but it has always been too hard to compute. Recent advances in computing too hard to compute. Recent advances in computing allow calculation of complex models allow calculation of complex models
Machines are good at gathering data and performing Machines are good at gathering data and performing complex analysis complex analysis
Machine Learning is a sea change in development of Machine Learning is a sea change in development of applications since it allows computers to be more applications since it allows computers to be more proactive and predictiveproactive and predictive
Intel® Confidential
5
Open Source ML
ApplicationsApplications of Machine Learning of Machine Learning
Interface – Audio Visual Speech Recognition (AVSR); Interface – Audio Visual Speech Recognition (AVSR); nnatural language processing, etc.atural language processing, etc.
AI – robotics, computer games, entertainment, etc.AI – robotics, computer games, entertainment, etc. Data Analysis – information retrieval, data mining, etc.Data Analysis – information retrieval, data mining, etc. Biological – gene sequencing, genomics, Biological – gene sequencing, genomics,
computational pharmacologycomputational pharmacology Computer – run time optimizationComputer – run time optimization Industrial – fault diagnosisIndustrial – fault diagnosis Applications of machine learning cover a broad rangeApplications of machine learning cover a broad range
Genomics - matching of protein strandsGenomics - matching of protein strands Collaborative Filtering - personal “Google”Collaborative Filtering - personal “Google” Drug Discovery – shortening of drug discovery cycleDrug Discovery – shortening of drug discovery cycle Patient and elder care – wireless camera and sensor network Patient and elder care – wireless camera and sensor network
help monitor patientshelp monitor patients
Intel® Confidential
6
Open Source ML
Open ML Components & PlanOpen ML Components & PlanKey:• Optimized• Implemented• Not implemented
Modeless Model based
Uns
uper
vise
dS
uper
vise
d
• K-means
• K-NN
• Boosted decision trees
• SVM
• Agglomerative clustering• Spectral clustering
• BayesNets: Classification
• Decision trees
• BayesNets: Parameter fitting
• Dependency Nets
• PCA
• Influence diagrams
• Bayesnet structure learning
Statistical LearningOpenSL - 2004
Bayesian NetworksOpenPNL-2003
OpenML
Intel® Confidential
7
Open Source ML
Model Based Machine Learning Model Based Machine Learning Machine Learning can be based on Models (model-Machine Learning can be based on Models (model-
based) or it could be Model-lessbased) or it could be Model-less In version 1.0 of OpenML Intel is focusing on Bayesian In version 1.0 of OpenML Intel is focusing on Bayesian
Networks and the Probabilistic Networks which fall Networks and the Probabilistic Networks which fall under model-based categoryunder model-based category
The Bayesian approach provides a mathematical rule The Bayesian approach provides a mathematical rule explaining how one should change existing beliefs in explaining how one should change existing beliefs in the light of new evidencethe light of new evidence
Model-less approaches are used for clustering and Model-less approaches are used for clustering and classificationclassification
Intel will release libraries using model-less approaches next Intel will release libraries using model-less approaches next yearyear
Intel® Confidential
8
Open Source ML
Applications of Model-less MLApplications of Model-less ML
• Suitable for applications such as Fault Diagnosis• The system does not have a model• It collects data and clusters and classifies them• Recognition is derived from these clusters
Machine 18Fab 11
Tolerance goes out when temperature
>87o
Intel® Confidential
9
Open Source ML
Applications of Model-based MLApplications of Model-based ML Our research has focused on Our research has focused on
Bayesian NetworksBayesian Networks Hidden Markov Models (HMM) – a Hidden Markov Models (HMM) – a
Bayesian Net - are widely used in Bayesian Net - are widely used in speech recognition, couple Hidden speech recognition, couple Hidden Markov Models are used in Audio Markov Models are used in Audio Visual Speech Recognition (use of Visual Speech Recognition (use of visual data in speech recognition)visual data in speech recognition)
Open Source PNL is an optimized Open Source PNL is an optimized infrastructure for research and infrastructure for research and development in Model Based development in Model Based Machine LearningMachine Learning
Audio Visual Speech Recognition
Face Recognition & Tracking
Intel® Confidential
10
Open Source ML
Example: Vision ApplicationsExample: Vision Applications
Image super resolution - Use a Bayesian method to develop a clear image from a small resolution picture
Intel® Confidential
11
Open Source ML
Intel Systems Technology LabIntel Systems Technology LabSanta Clara, CA, USA
Graphics LabMachine LearningArchitecture Lab
Hillsboro, OR, USAWireless Systems
Media3D Graphics
Tech. Management
Beijing, PR China
China Research CenterSpeech and Machine
Learning
Nizhny Novgorod, Russia
Architecture for Machine Learning, Media, 3D Graphics,
Computer Vision
•One of three major labs of Intel Corporate Technology Group
•300 researchers worldwide
•Focus on impact on Intel Architecture
•Drive university and industry initiatives
Intel® Confidential
12
Open Source ML
WhyWhy Open Source..?Open Source..? Expands our research baseExpands our research base
Allows Intel researchers to collaborate easily Allows Intel researchers to collaborate easily with thousands of colleagues worldwidewith thousands of colleagues worldwide
Remove barriers, speed up collaborationRemove barriers, speed up collaboration
Tap into a very large innovative communityTap into a very large innovative community Ability to get feedback from a large number of Ability to get feedback from a large number of
developers to design future microprocessorsdevelopers to design future microprocessors Chance to explore innovative usage models Chance to explore innovative usage models
Diffuse new technologies and usage Diffuse new technologies and usage models to a wide group of early adoptersmodels to a wide group of early adopters
Intel® Confidential
13
Open Source ML
Open Research ProgramOpen Research Program
Currently four open source projectsCurrently four open source projects http://www.intel.com/software/products/opensource/index.htmhttp://www.intel.com/software/products/opensource/index.htm
OpenCV – Computer Vision LibraryOpenCV – Computer Vision Libraryhttp://www.intel.com/research/mrl/research/opencv/http://www.intel.com/research/mrl/research/opencv/
OpenRC - Open Research CompilerOpenRC - Open Research Compilerhttp://ipf-orc.sourceforge.net/ORC-overview.htmhttp://ipf-orc.sourceforge.net/ORC-overview.htm
OpenLF – Open Light FieldsOpenLF – Open Light Fieldshttp://www.intel.com/research/mrl/research/lfm/http://www.intel.com/research/mrl/research/lfm/
OpenAVSR – Audio Visual Speech RecognitionOpenAVSR – Audio Visual Speech Recognitionhttp://www.intel.com/research/mrl/research/avcsr.htmhttp://www.intel.com/research/mrl/research/avcsr.htm
Intel® Confidential
14
Open Source ML
Released in June 2000Released in June 2000 A library of 500+ computer vision algorithms, A library of 500+ computer vision algorithms,
including applications such as Face including applications such as Face Recognition, Face Tracking, Stereo Vision, Recognition, Face Tracking, Stereo Vision, Camera CalibrationCamera Calibration
Highly tuned for IAHighly tuned for IA Windows and Linux VersionsWindows and Linux Versions Over 500,000 DownloadsOver 500,000 Downloads Broad use in academia (450) and Industry (360)Broad use in academia (450) and Industry (360)
Example: OpenCVExample: OpenCV
Intel® Confidential
15
Open Source ML
More InformationMore Information
Visit Open Source MLOpen Source ML Web page & download at:
http://www.intel.com/research/mrl/pnl
Intel® Confidential
16
Open Source ML
BackupBackup
Intel® Confidential
17
Open Source ML
Modeless and Model Based MLModeless and Model Based ML
ModelessModeless ClassifiersClassifiers ClusteringClustering Kernel estimatorsKernel estimators
Model BasedModel Based Bayesian NetworksBayesian Networks Function fittersFunction fitters RegressionRegression FiltersFilters
We’ll use an example application from our current research to descibe two basic approaches to machine learning:
AAA
AACACB CBABBC
CCB ABBCCB
BC
A
B C
Intel® Confidential
18
Open Source ML
Quick view of Bayesian networksQuick view of Bayesian networks
Intel® Confidential
19
Open Source ML
What is a Bayesian Network?What is a Bayesian Network? AA Bayesian networkBayesian network, or a belief network, is a graph in , or a belief network, is a graph in
which the following holds:which the following holds: A set of random variables makes up nodes of the network.A set of random variables makes up nodes of the network. A set of directed links connects pairs of nodes to denote causality A set of directed links connects pairs of nodes to denote causality
relations between variables.relations between variables.
Each node has a Each node has a conditional probability conditional probability distribution (CPD) that distribution (CPD) that quantifies the effects quantifies the effects that the parents have on that the parents have on the nodethe node
Graphical Models are Graphical Models are more general, allowing more general, allowing undirected links, mixed undirected links, mixed directed/undirected directed/undirected connections, and loops connections, and loops within the graphwithin the graph
Intel® Confidential
20
Open Source ML
Computational Advantages ofComputational Advantages ofBayesian NetworksBayesian Networks Bayesian Networks graphically express Bayesian Networks graphically express conditional independenceconditional independence of probability of probability
distributions. distributions. Independencies can be exploited for large computational savings.Independencies can be exploited for large computational savings.
EXAMPLE:EXAMPLE:
Joint probability of 3 discrete variable (A,B,C) system with 5 possible values each:
P(A,B,C) = 5x5x5 table:
A
B C
A
B
C
A
B C
A
But a graphical model factors the probabilities taking advantage of the independencies:
A
125 parameters
55 parameters
Intel® Confidential
21
Open Source ML
Causality and Bayesian NetsCausality and Bayesian Nets
Mains
Transf.
Diode
Diode
Capac.
Ammeter
Battery
Observed
Un-Observed
Think of Bayesian Networks as a “Circuit Diagram” of Probability Models
• The Links indicate causal effect, not direction of information flow.• Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram.
Intel® Confidential
22
Open Source ML
Quick view of Decision Trees and Quick view of Decision Trees and Statistical BoostingStatistical Boosting
Intel® Confidential
23
Open Source ML
Statistical ClassificationStatistical ClassificationCluster data to infer or predict properties Cluster data to infer or predict properties
Example: Decision treesExample: Decision trees
Find splits that most “purify” the labeled data
AACBAABBCBCC
AACACB CBABBC
All the way down …
AAA
AACACB CBABBC
CCB
BCC
ABBCCB
BC A BBC
CBB
Prune the tree to minimize complexity
AAA
AACACB CBABBC
CCB ABBCCB
BC
The split rules are used to classify Future data
Intel® Confidential
24
Open Source MLStatistical ClassificationStatistical ClassificationBoostingBoosting
Use a weak classifier such as a 1 level tree:
AACBAABBCBCC
AACACB CBABBC
Re-weight the error cases and classify again;Record weight factor “Wi” for “ith” case.
Use the error weighted forest to voteon the classification of new data
AACBAABBCBCC
AAAACB CCBBBC
AACBAABBCBCC
AACC CCAABBBB
AACBAABBCBCC
AAAABBBB CCCC
AACBAABBCBCC
AAAA CBCCBBBC
AACBAABBCBCC
AAABBB ACCCCB
AACBAABBCBCC
AAABBB ACCCCB
AACBAABBCBCC
AAAABBBB CCCC
AACBAABBCBCC
AAAABBBB CCCC
Repeat until you have a “forest”AACBAABBCBCC
AACC CCAABBBB
AACBAABBCBCC
AAABBB ACCCCB
AACBAABBCBCC
AAAABBBB CCCC
AACBAABBCBCC
AACC CCAABBBB
AACBAABBCBCC
AAAA CBCCBBBC
AACBAABBCBCC
AAABBB ACCCCB
AACBAABBCBCC
AAAA CBCCBBBC
Decision1 * W1
AACBAABBCBCC
AAABBB ACCCCB
AACBAABBCBCC
AACC CCAABBBB
Decision2 * W2
DecisionN * WN
Weighted Sum Decision
Intel® Confidential
25
Open Source ML
Application areas and librariesApplication areas and libraries
Intel® Confidential
26
Open Source ML
Applications of MLApplications of ML
Interface Data AnalysisAI
Biometric ID
Lips+SpeechAVSR
VisionModels
Speech
AudioModels
Text Recog.
Natural Lang.Action Planning
CognitiveModeling
Game Play
Robotics
Mapping
Neural NetsSVM
Trees,Boosting,Randomforest
ReinforcementLearning
StatisticalRegression,ANOVA, …
StochasticDiscrimination
Ad
ap
tive
F
ilters
Re
latio
na
lN
etw
ork
s
DecisionTheory,InfluenceDiagrams
GraphicalModels/MRFs
BayesianNetworks
GeneticAlgorithms
Industrial
FaultDiagnosis
ProcessControl
Disposition
SupplyChain
Models ofManufacturing
TOOLS:
Actively working on
External activity
Past work
RampingKey:
InformationRetrieval
Datamining
Sensor Fusion
InfoFiltering
CollaborativeFiltering
Biologic
Proteomics
Genomics
Metabolics
GeneSequencing
Epidemiology
ComputationalPharmacology
Computer
TraceCompression
CompilerOptimization
Binary TransAdaptation
Run TimeOptimization
Intel® Confidential
27
Open Source ML
Game Play
CognitiveModeling
Probabilistic Network LibraryProbabilistic Network LibraryApplication Application
Driven Driven
Drive intoDrive intoFuture HardwareFuture Hardware
Lips+SpeechAVSR
InformationRetrieval Trace
Compression
LearnedControl
VisionModels
GeneSequencing
EpidemiologyGenomics
InterfaceInterface Data MiningData Mining ““AI”AI”
Bayesian NetworkBayesian NetworkEngineEngine
WorkloadWorkloadAnalysisAnalysis
ArchitectureArchitecture
Intel Universities
Robotics
Drive intoDrive intohardwarehardware
Chipset Platform CPU Instructionscache
Create New ArchitecturesModify Existing Architectures
Theories &Theories &AlgorithmsAlgorithms
StructureLearning
Decision &Utility theory
Dynamic BNMRFs
Gibbs SamplingParticle Filter
Junction TreeFactor Graph
EMReinforcement
Loopy BeliefVariational
Data HandlingCross ValidationPlates
InfoFiltering
Speech
AudioModels
Natural Lang.Biometric ID
ProcessControl
Disposition
SupplyChain
FaultDiagnosis
Models ofManufacturing
IndustrialIndustrial
Intel® Confidential
28
Open Source ML
Open Source Computer Vision (OpenCV)Open Source Computer Vision (OpenCV)
Intel® Confidential
29
Open Source ML
Machine Learning Library (OpenMLL)Machine Learning Library (OpenMLL)
AACBAABBCBCC
AAA
AACACB CBABBC
CCB
BCC
ABBCCB
BC A BBC
CBB
CLASSIFICATION / REGRESSIONCLASSIFICATION / REGRESSIONCARTStatistical BoostingMARTRandom ForestsStochastic DiscriminationLogisticSVMK-NN
CLUSTERINGCLUSTERINGK-MeansSpectral ClusteringAgglomerative ClusteringLDA, SVD, Fisher Discriminate
TUNING/VALIDATIONTUNING/VALIDATIONCross validationBootstrappingSampling methods
Alpha Q1’04, Beta Q4’04
Intel® Confidential
30
Open Source ML
Optimization (Optimization (Lib ?Lib ?))
Large-scale Optimizations
Continuous
Mixed DiscreteConstrained Unconstrained
Linear Nonlinear Nonlinear
LP QP NLP
Interior Point
Active Set
Branch and Bound
Conjugate Gradient, Newton
Sim. Anealing, Genetic Alg,
Stoch. Search, Network
Programming,Dynamic
Programming
Combinatorial Optimizations
Domain Reduction,
Constraints Propagation
SQPSimplex
Problems looking at: Circuit layout; Device geometry; Chemical binding synthesis