Click here to load reader

Causal Modelling for Relational Data

  • Upload
    eldon

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Causal Modelling for Relational Data. Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada. Outline. Relational Data vs. Single-Table Data Two key questions Definition of Nodes (Random Variables) Measuring Fit of Model to Relational Data Previous Work - PowerPoint PPT Presentation

Citation preview

Learning Markov Logic Networks with Many Descriptive Attributes

Causal Modelling for Relational DataOliver SchulteSchool of Computing ScienceSimon Fraser UniversityVancouver, Canada

2OutlineRelational Data vs. Single-Table DataTwo key questions Definition of Nodes (Random Variables) Measuring Fit of Model to Relational DataPrevious WorkParametrized Bayes Nets (Poole 2003), Markov Logic Networks (Domingos 2005).The Cyclicity Problem.New WorkThe Learn-and-Join Bayes Net Learning Algorithm.A Pseudo-Likelihood Function for Relational Bayes Nets.Causal Modelling for Relational Data - CFE 2010Single Data Table Statistics3Traditional Paradigm ProblemSingle populationRandom variables = attributes of population members.flat data, can be represented in single table.StudentsNameintelligencerankingJack31Kim21Paul12JackKimPaulCausal Modelling for Relational Data - CFE 2010populationsampleOrganizational Database/Science4Structured Data.Multiple Populations.Taxonomies, Ontologies, nested Populations.Relational Structures.JackKimPaul101102103Causal Modelling for Relational Data - CFE 2010Relational Databases5Input Data: A finite (small) model/interpretation/possible world. Multiple Interrelated Tables.Students-idIntelligenceRankingJack31Kim21Paul12Professorp-idPopularityTeaching-aOliver31Jim21Coursec-idRatingDifficulty1013110222RAs-idp-idSalaryCapabilityJackOliverHigh3KimOliverLow1PaulJimMed2Registrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B1Causal Modelling for Relational Data - CFE 2010Link based ClassificationP(diff(101))?6Students-idIntelligenceRankingJack31Kim21Paul12Professorp-idPopularityTeaching-aOliver31Jim21Coursec-idRatingDifficulty1013???10222RAs-idp-idSalaryCapabilityJackOliverHigh3KimOliverLow1PaulJimMed2Registrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B1Causal Modelling for Relational Data - CFE 2010Link predictionP(Registered(jack,101))?7Students-idIntelligenceRankingJack31Kim21Paul12Professorp-idPopularityTeaching-aOliver31Jim21Coursec-idRatingDifficulty1013110222RAs-idp-idSalaryCapabilityJackOliverHigh3KimOliverLow1PaulJimMed2Registrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B1Causal Modelling for Relational Data - CFE 2010Relational Data: what are the random variables (nodes)?Causal Modelling for Relational Data - CFE 2010A functor is a function symbol with 1st-order variables f(X), g(X,Y), R(X,Y).Each variable ranges over a population or domain.A Parametrized Bayes Net (PBN) is a BN whose nodes are functors (Poole UAI 2003).Single-table data = all functors contain the same single free variable X.8Example: Functors and Parametrized Bayes NetsCausal Modelling for Relational Data - CFE 2010intelligence(S)diff(C)Registered(S,C)wealth(X)Friend(X,Y)wealth(Y)age(X)9 Parameters: conditional probabilities P(child|parents). e.g., P(wealth(Y) = T | wealth(X) = T, Friend(X,Y) = T) defines joint probability for every conjunction of value assignments.Predicates are capitalized, attributes lower calse9Domain Semantics of FunctorsCausal Modelling for Relational Data - CFE 2010 Halpern 1990, Bacchus 1990 Intuitively, P(Flies(X)|Bird(X)) = 90% means the probability that a randomly chosen bird flies is 90%. Think of a variable X as a random variable that selects a member of its associated population with uniform probability. Then functors like f(X), g(X,Y) are functions of random variables, hence themselves random variables.10Domain Semantics: ExamplesCausal Modelling for Relational Data - CFE 2010 P(S = jack) = 1/3. P(age(S) = 20) = s:age(s)=20 1/|S|. P(Friend(X,Y) = T) = x,y:friend(x,y) 1/(|X||Y|). In general, the domain frequency is the number of satisfying instantiations or groundings, divided by the total possible number of groundings. The database tables define a set of populations with attributes and links database distribution over functor values.11Defining Likelihood Functions for Relational DataNeed a quantitative measure of how well a model fits the data.Single-table data consists of identically and independently structured entities (IID).Relational data is not IID.Likelihood function simple product of instance likelihoods.

12Students-idIntelligenceRankingJack31Kim21Paul12Professorp-idPopularityTeaching-aOliver31Jim21Coursec-idRatingDifficulty1013110222RAs-idp-idSalaryCapabilityJackOliverHigh3KimOliverLow1PaulJimMed2Registrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B112Structured data is not IID.12Knowledge-based Model ConstructionCausal Modelling for Relational Data - CFE 2010 Ngo and Haddaway, 1997; Koller and Pfeffer, 1997; Haddaway, 1999.1st-order model = template. Instantiate with individuals from database (fixed!) ground model. Isomorphism DB facts assignment of values likelihood measure for DB.intelligence(S)diff(C)Registered(S,C)Class-level Templatewith 1st-order Variablesintelligence(jack)diff(100)Registered(jack,100)intelligence(jane)diff(200)Registered(jack,200)Registered(jane,100)Registered(jane,200)Instance-level Model w/domain(S) = {jack,jane}domain(C) = {100,200}13Translates generic frequencies (type 1 probabilities) to single-case probabilities (type 2 probabilities). Holistic approach: assign probability to entire possible world.13The Combining ProblemCausal Modelling for Relational Data - CFE 2010Registered(S,C)diff(C)intelligence(S)intelligence(jack)diff(100)Registered(jack,100)intelligence(jane)diff(200)Registered(jack,200)Registered(jane,100)Registered(jane,200) How do we combine information from different related entities (courses)? Aggregate properties of related entities (PRMs; Getoor, Koller, Friedman). Combine probabilities. (BLPs; Poole, deRaedt, Kersting.)14The Cyclicity ProblemCausal Modelling for Relational Data - CFE 2010Class-level model (template)Rich(X)Friend(X,Y)Rich(Y)Ground modelRich(a)Friend(a,b)Rich(b)Friend(b,c)Rich(c)Friend(c,a)Rich(a) With recursive relationships, get cycles in ground model even if none in 1st-order model. Jensen and Neville 2007: The acyclicity constraints of directed models severely constrain their applicability to relational data.15Hidden Variables Avoid CyclesCausal Modelling for Relational Data - CFE 201016Rich(X)Friend(X,Y)Rich(Y)U(X)U(Y) Assign unobserved values u(jack), u(jane). Probability that Jack and Jane are friends depends on their unobserved type. In ground model, rich(jack) and rich(jane) are correlated given that they are friends, but neither is an ancestor. Common in social network analysis (Hoff 2001, Hoff and Rafferty 2003, Fienberg 2009). $1M prize in Netflix challenge. Also for multiple types of relationships (Kersting et al. 2009). Computationally demanding.Hoff gives a justification by applying de Finettis exchangeability theorem (matrix version).16Undirected Models Avoid CyclesCausal Modelling for Relational Data - CFE 201017Class-level model (template)Ground modelRich(X)Friend(X,Y)Rich(Y)Friend(a,b)Rich(a)Rich(b)Friend(c,a)Rich(c)Friend(b,c)Markov Network ExampleUndirected graphical modelCancerCoughAsthmaSmokingPotential functions defined over cliquesSmokingCancer (S,C)FalseFalse 4.5FalseTrue 4.5TrueFalse 2.7TrueTrue 4.5

Causal Modelling for Relational Data - CFE 20101818From Pedro DomingosMarkov Logic Networks19Domingos and Richardson ML 2006An MLN is a set of formulas with weights.Graphically, a Markov network with functor nodes.Solves the combining and the cyclicity problems.For every functor BN, there is a predictively equivalent MLN (the moralized BN).

Causal Modelling for Relational Data - CFE 2010Rich(X)Friend(X,Y)Rich(Y)Rich(X)Friend(X,Y)Rich(Y)Richardson and Domingos: no ad-hoc combining rules.19New ProposalCausal Modelling for Relational Data - CFE 201020Causality at token level (instances) is underdetermined by type level model.Cannot distinguish whether wealth(jane) causes wealth(jack), wealth(jack) causes wealth(jane) or both (feedback). Focus on type-level causal relations.How? Learn model of Halperns database distribution.For token-level inference/prediction, convert to undirected model.

wealth(X)Friend(X,Y)wealth(Y)Supported by Dretske on universals, Hausman on causation between variables.20The Learn-and-Join Algorithm (AAAI 2010)21Required: single-table BN learner L. Takes as input (T,RE,FE):Single data table.A set of edge constraints (forbidden/required edges).Nodes: Descriptive attributes (e.g. intelligence(S)) Boolean relationship nodes (e.g., Registered(S,C)).RequiredEdges, ForbiddenEdges := emptyset.For each entity table Ei:Apply L to Ei to obtain BN Gi. For two attributes X,Y from Ei, If X Y in Gi, then RequiredEdges += X Y .If X Y not in Gi, then ForbiddenEdges += X Y .For each relationship table join (= conjunction) of size s = 1,..kCompute Rtable join, join with entity tables := Ji.Apply L to (Ji , RE, FE) to obtain BN Gi. Derive additional edge constraints from Gi.Add relationship indicators: If edge X Y was added when analyzing join R1 join R2 join Rm, add edges Ri Y.Causal Modelling for Relational Data - CFE 2010To deal with autocorrelation, duplicate entity tables. May want to have back-up slide on that. Explain that you will give an example in a moment. The main thing to note is that we recursively add more and more edge constraints and feed those back into later stages of the search.Neumann question: how do you know which data tables to use? Answer: this is based on the relational schema. Without relational schema, may be able to get this from functor structure only interesting question.21Phase 1: Entity tablesCausal Modelling for Relational Data - CFE 201022diff(C)rating(C)popularity(p(C))teach-ability(p(C))StudentsNameintelligencerankingJack31Kim21Paul12ranking(S)intelligence(S)BN learner LCourseNumber Profratingdifficulty101Oliver31102David22103Oliver32BN learner LI switched back to our original example, mainly because I was in a rush and this had the variables in it (intelligence(S)) rather than intelligence. Its important to have those because o.w. its inconsisent with poole. Plus we need the variables to represent autocorrelations.22Phase 2: relationship tables23RegistrationStudentCourseS.NameC.numbergradesatisfactionintelligencerankingratingdifficultyJack101A13131..ranking(S)intelligence(S)diff(C)rating(C)popularity(p(C))teach-ability(p(C))BN learner Lranking(S)intelligence(S)grade(S,C)satisfaction(S,C)diff(C)rating(C)popularity(p(C))teach-ability(p(C))Maybe drop professor attributes. This got kind of messed up when I zoomed the whole image. Could be fixed by ungrouping and making fonts smaller.23Phase 3: add Boolean relationship indicator variablesCausal Modelling for Relational Data - CFE 201024ranking(S)intelligence(S)diff(C)rating(C)grade(S,C)satisfaction(S,C)popularity(p(C))teach-ability(p(C))ranking(S)intelligence(S)diff(C)rating(C)grade(S,C)satisfaction(S,C)Registered(S,C)popularity(p(C))teach-ability(p(C))Also messed up, probably need to make fonts smaller or move text.24Running time on benchmarks

Time in Minutes. NT = did not terminate. x + y = structure learning + parametrization. JBN: Our join-based algorithm. MLN, CMLN: standard programs from the U of Washington (Alchemy)25Causal Modelling for Relational Data - CFE 2010Accuracy26Causal Modelling for Relational Data - CFE 2010Pseudo-likelihood for Functor Bayes NetsCausal Modelling for Relational Data - CFE 201027What likelihood function P(database,graph) does the learn-and-join algorithm optimize?Moralize the BN (causal graph).Use the Markov net likelihood function for moralized BN---without the normalization constant.families. P(child|parent)#child-parent instancespseudo-likelihood.Relational Causal GraphMarkov Logic NetworkLikelihood FunctionFeatures of Pseudo-likelihood P*Causal Modelling for Relational Data - CFE 201028Tractability: maximizing estimates = empirical conditional database frequencies!Similar to pseudo-likelihood function for Markov nets (Besag 1975, Domingos and Richardson 2007).Mathematically equivalent but conceptually different interpretation: expected log-likelihood for randomly selected individuals.Halpern Semantics for Functor Bayes Nets (new)Causal Modelling for Relational Data - CFE 201029Randomly select instances X1 = x1,,Xn=xn. for each variable in BN.Look up their properties, relationships.Compute log-likelihood for the BN assignment obtained from the instances.LH = average log-likelihood over uniform random selection of instances.

Rich(jack)Friend(jack,jane)Rich(jane)Rich(X)Friend(X,Y)Rich(Y)=T=F=T=T=T=FProposition LH(D,B) = ln(P*(D,B) x cwhere c is a (meaningful) constant.No independence assumptions!No independence assumpttions!29Summary of Review30Two key conceptual questions for relational causal modelling.What are the random variables (nodes)?How to measure fit of model to data?Nodes = functors, open function terms (Poole).Instantiate type-level model with all possible tokens. Use instantiated model to assign likelihood to the totality of all token facts.Problem: instantiated model may contain cycles even if type-level model does not.One solution: use undirected models.

Causal Modelling for Relational Data - CFE 2010Summary of New ResultsCausal Modelling for Relational Data - CFE 201031New algorithm for learning causal graphs with functors.Fast and scalable (e.g., 5 min vs. 21 hr).Substantial Improvements in Accuracy. New pseudo-likelihood function for measuring fit of model to data.Tractable parameter estimation.Similar to Markov network (pseudo)-likelihood.New semantics: expected log-likelihood of the properties of randomly selected individuals.

Open ProblemsCausal Modelling for Relational Data - CFE 201032LearningLearn-and-Join learns dependencies among attributes, not dependencies among relationships.Parameter learning still a bottleneck.Inference/PredictionMarkov logic likelihood does not satisfy Halperns principle:if P((X)) = p, then P((a)) = pwhere a is a constant.(Related to Millers principle).Is this a problem?

Millers principle: conditional on the fact that the class-level probability P((X)) = p, we have that P((a)) = p.32Thank you!33Any questions?

Causal Modelling for Relational Data - CFE 2010Choice of FunctorsCausal Modelling for Relational Data - CFE 201034Can have complex functors, e.g.Nested: wealth(father(father(X))).Aggregate: AVGC{grade(S,C): Registered(S,C)}.In remainder of this talk, use functors corresponding toAttributes (columns), e.g., intelligence(S), grade(S,C)Boolean Relationship indicators, e.g. Friend(X,Y).Typical Tasks for Statistical-Relational Learning (SRL)Causal Modelling for Relational Data - CFE 2010Link-based Classification: given the links of a target entity and the attributes of related entities, predict the class label of the target entity.Link Prediction: given the attributes of entities and their other links, predict the existence of a link.35