Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University, Germany at MLconf SEA - 5/20/16

AIS Template

Declarative Programming for Statistical ML

Kristian Kersting

Martiin MladenovTUD

Babak AhmadiPicoEgo

Amir GlobersonHUJI

Martin GroheRWTH

Sriraam NatarajanU. IndianaLeonard KleinhansTUD

Danny HeinrichTUDand many more

Pavel TokmakovINRIA Grenoble

Is there a -01 flag for Statistical ML?

Kristian Kersting

Martiin MladenovTUD

Babak AhmadiPicoEgo

Amir GlobersonHUJI

Martin GroheRWTH

Sriraam NatarajanU. IndianaLeonard KleinhansTUD

Danny HeinrichTUDand many more

Pavel TokmakovINRIA Grenoble

Kristian Kersting - Declarative Programming for Statistical ML

There is an arms race to deeply understand dataKristian Kersting - Declarative Programming for Statistical ML

Kristian Kersting - Declarative Programming for Statistical ML Take your spreadsheet

FeaturesObjects

5

Latent Dirichlet Allocation

Matrix Factorization

FeaturesObjects and apply some ML

Gaussian Processes

Decision Trees/Boosting

Autoencoder/Deep Learningand many more Kristian Kersting - Declarative Programming for Statistical ML Support Vector Machines

6

Is it really that simple?Kristian Kersting - Declarative Programming for Statistical ML

What is the probability that the first card of a randomly shuffled deck with 52 cards is an Ace?

Guy van den Broeck UCLAA simple example

A simple example

Guy van den Broeck UCLAcard(1,d2)card(1,d3)card(1,pAce)card(52,d2)card(52,d3)card(52,pAce)

Guy van den Broeck UCLA

card(1,d2)card(1,d3)card(1,pAce)card(52,d2)card(52,d3)card(52,pAce)A simple example

Guy van den Broeck UCLANo independencies. Fully connected. 22704 states card(1,d2)card(1,d3)card(1,pAce)card(52,d2)card(52,d3)card(52,pAce)A simple exampleKristian Kersting - Declarative Programming for Statistical ML

Guy van den Broeck UCLAA machine will not solve the problemcard(1,d2)card(1,d3)card(1,pAce)card(52,d2)card(52,d3)card(52,pAce)A simple example

Positions and cards are exchangable but the machine is not aware of these symmetriesLets use a high-level language e.g.Markov Logic Networks (MLNs)w1:p,x,y: card(P,X),card(P,Y)x=yw2:c,x,y: card(X,C),card(Y,C)x=yand symmetry- and language-aware infernceFaster modellingFaster inference and learningWhat are we missing ?


Symmetry-Aware Message Passing

Compress the modelRun message passing inference on the smaller model[Singla, Domingos AAAI08; Kersting, Ahmadi, Natarajan UAI09; Ahmadi, Kersting, Mladenov, Natarajan MLJ13;Mladenov, Globerson, Kersting AISTATS `14, UAI `14; Mladenov, Kersting UAI15; ...]Big modelsmall model


De Raedt, Kersting, Natarajan, Poole Statistical Relational Artificial Intelligence,2016 the study and design of intelligent agents that act in noisy worlds composed of objects and relations among the objectsStatistical Relational AIScalingUncertaintyLogicGraphsTreesMining AndLearning [Getoor, Taskar MIT Press 07; De Raedt, Frasconi, Kersting, Muggleton, LNCS08; Domingos, Lowd Morgan Claypool 09; Natarajan, Kersting, Khot, Shavlik Springer Brief15; Russell CACM 58(7): 88-97 15]

15

But wait a minute! We want to use some ML, not just graphical models!Kristian Kersting - Declarative Programming for Statistical ML

Latent Dirichlet Allocation

Matrix Factorization

Gaussian Processes

Decision Trees/Boosting

Autoencoder/Deep Learningand many more Support Vector Machines

Lets say we want to classify publications that cite each other

This is a quadratic program. If you replace l2- by l1-,l-norm you get a linear program

Standard ML approach: Support Vector MachinesKristian Kersting - Declarative Programming for Statistical ML [Vapnik 79; Bennett99; Mangasarian99; Zhou, Zhang, Jiao02, ... ]

Write down the problem in paper form. The machine then compiles automatically into algebraic solver form.Statistical Machine Learning via Declarative Programming[Kersting, Mladenov, Tokmakov AIJ15, Mladenov, Heinrich, Kleinhans, Gonsio, Kersting DeLBP16]Logically parameterized variable (set of ground variables)Logically parameterized constraintLogically parameterized objectiveData stored externally

Program1Data1Program2Data2Program3Data3...Kristian Kersting - Declarative Programming for Statistical ML [Kersting, Mladenov, Tokmakov AIJ15, Mladenov, Heinrich, Kleinhans, Gonsio, Kersting DeLBP16]Captures the essence of a problem and can be reused for several problems

MP1DeclarativeProgramMP2MPnData1Data2Datan...Kristian Kersting - Declarative Programming for Statistical ML [Kersting, Mladenov, Tokmakov AIJ15, Mladenov, Heinrich, Kleinhans, Gonsio, Kersting DeLBP16]Captures the essence of a problem and can be reused for several problems

But wait, publications are citing each other. OMG, I have to use graph kernels!REALLY?

Simply program some additional constraints[Kersting, Mladenov, Tokmakov AIJ15, Mladenov, Heinrich, Kleinhans, Gonsio, Kersting DeLBP16]


http://www-ai.cs.uni-dortmund.de/weblab/static/RLP/html/

Kristian Kersting - Declarative Programming for Statistical ML Loops and relations get interwined, and models can refer to each otherDBMS InterfaceUsing a probabilistic programming language we can even get stochastic relational mathematical programs

Finally, the -O1 flag[Kersting, Mladenov, Tokmakov AIJ 2015, Mladenov, Kleinhans, Kersting 2016]Kristian Kersting - Declarative Programming for Statistical ML

Reduce the QP via symmetries Run any solver on the reduced QP

and the -02 flagKristian Kersting - Declarative Programming for Statistical ML

Algebraic Decision Diagrams

Formulae parse treesMatrix Free Optimization()+= Optimization with 60 Millions of non-zeros with 12 minutes per log-barrier iteration and actually sublinear in the number of non-zeros

high-level languages for machine learning and optimization are a step towards the ConclusionsKristian Kersting - Declarative Programming for Statistical ML

Democratization of Machine LearningKristian Kersting - Declarative Programming for Statistical ML Reduces the level of expertise necessary to build optimization applications, makes models faster to write and easier to communicate Facilitate the construction of sophisticated models with rich domain knowledgeSpeed up solvers by exploiting language properties, compression, and compilation


This is all inspired by Turing Award Winner Jim Grays grand challenge of Automated ProgrammingMeeting it requires the help of all of us: Machine Learning, Data Mining, Databases, SAT, Knowledge Representation, Constraint Programming, Game Theory, Graph Theory, Optimization !Thanks for your attention!


Technology

Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University, Germany at MLconf SEA - 5/20/16