View
915
Download
3
Category
Preview:
DESCRIPTION
Citation preview
Causal Data Mining
Richard Scheines
Dept. of Philosophy, Machine Learning, &
Human-Computer Interaction
Carnegie Mellon
1. Predictive Data Mining
Finding predictive relationships in data
– What feature of student behavior predicts learning
– Who will default on credit cards
– Who will get an “A” in your course
– Which HS students will do well at CMU
– Do students cluster by “learning style”
Causal Data Mining
Finding causal relationships in data
– What feature of student behavior causes learning
– What will happen when we make everyone take a
reading quiz before each class
– What will happen when we program our tutor to
intervene to give hints after an error
Predictive Data MiningX1 X2 X3 . . Xk Y
1 1.7 28 M . . 2.4 1
2 2.0 11 F . . 1.1 0
3 1.9 17 F . . 1.1 1
. . . . . . . .
. . . . . . . .
N 2.8 12 M . . 1.8 0
Data Mining Search
Predictive Model
Y = f(X1, X2, …Xk)
Predictive Data Mining
Data Mining Search
Predictive Model
Y = f(X1, X2, …Xk)
Model Classes
1. Simple Regression
2. Locally Weighted Regression
3. Logistic Regression
4. Neural Nets
5. Vector Support Machines
6. Decision Trees
7. Bayes Net
8. Naïve Bayes Classifier
9. Independent Components
10. Clustering
11. Etc.
Predictive Data Mining
Predictive Model under Constraints
Y = f(X1, X2, …Xk),
e.g., f Additive functions
Data Mining Search
Predictive Data Mining
Predictive Model under Constraints
Y = f(X1, X2, …Xk),
Or
Probability Model under Constraints:
P(Y | X1, X2, …, Xk), where P Gaussian, with mean 0
Data Mining Search
Predictive Data Mining
Decision Tree Search
Age
>57
57
X-Ray
Lab2
Pos
Neg.
Lab2
1.8
>1.8
P(Hosp.) = .78
Lab1
P(Hosp.) = .59 >1.4
1.4 P(Hosp.) = .10
P(Hosp.) = .66
P(Hosp.) = .75
P(Hosp.) = .05 2.3
>2.3
Predictive Data Mining ≠
Causal Data Mining
P(Y | X1, X2, …, Xk)
P(Y | X1set, X2, …, Xk)
Conditioning is not the same as intervening
Causal Discovery
Statistical Data Causal Structure
Background Knowledge
- X2 before X3
- no unmeasured common causes
X3 | X2 X1
Independence Relations
Data
Statistical Inference
X2 X3 X1
Equivalence Class of Causal Graphs
X2 X3 X1
X2 X3 X1
Discovery Algorithm
Causal Markov Axiom (D-separation)
Causal Discovery Software TETRAD IV
www.phil.cmu.edu/projects/tetrad
Full Semester Online Course in Causal & Statistical Reasoning
Full Semester Online Course in Causal & Statistical Reasoning
• Course is tooled to record certain events: Logins, page requests, print requests, quiz attempts, quiz
scores, voluntary exercises attempted, etc.
• Each event was associated with attributes: Time student-id Session-id
Printing and Voluntary Comprehension Checks: 2002 --> 2003
.302
-.41
.75
.353
.323
pre
print voluntary questions
quiz
final
2002
-.08
-.16
.41
.25
pre
print voluntary questions
final
2003
15
References
• Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press)
• Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press
• Shih, B., Koedinger, K., & Scheines, R. (2008). A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the First Educational Data Mining Conference.
• Shih, B., Koedinger, K., and Scheines, R. (2007) "Optimizing Student Models for Causality." in Proceedings of the 13th International Conference on Artificial Intelligence in Education.
• Arnold, A., Beck, J., and Scheines, R. (2006). "Feature Discovery in the Context of Educational Data Mining: An Inductive Approach." Proceedings of the AAAI2006 Workshop on Educational Data Mining, Boston, MA.
• Scheines, R., Leinhardt, G., Smith, J., and Cho, K. (2005) "Replacing Lecture with Web-Based Course Materials, Journal of Educational Computing Research, 32, 1, 1-26.
Recommended