46
Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School of EECS Oregon State University Hung Bui SRI International PAIR 2009 1/25/2011 1

Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Activity Recognition in TaskTracer and CALO

Tom Dietterich, Victoria Keiser, Jianqiang ShenSchool of EECS

Oregon State University

Hung BuiSRI International

PAIR 20091/25/2011 1

Page 2: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Outline

• TaskTracer: “Project‐Oriented User Interface”– Semantic Instrumentation– Projects & Provenance

• CALO: Activity Recognition– Workflow Discovery– Activity Recognition

PAIR 20091/25/2011 2

Page 3: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

• Observation: Knowledge workers engage in continuous multi‐tasking and interruption recovery– Median time between interruptions: 18.0 min– Median time to return from interruption: 11.4 min

• Goal: Support interruption recovery

• Observation: Knowledge workers manipulate thousands of documents, email messages, and web pages– Median 3,909 over 4 weeks

• Goal: Support finding and re‐finding relevant information– On average, each item is opened 1.74 times over 4 weeks

1/25/2011 3CALO Final Meeting

Tasktracer Goals

Page 4: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

TaskTracer Hypotheses• Hypotheses: 

– Users find it natural to organize their work in terms of projects– Each document, web page, email message, and person is 

associated with a small number of projects– User time at the desktop can be viewed as multi‐tasking 

among a set of active projects– When working on project P, users will tend to access only 

other items also associated with P

• Implications:– Ask users to define a hierarchy of projects– Track the user’s current project P– Automatically tag new items based on the user’s current 

project P– Support information access to existing items based on the 

current project P

1/25/2011 4CALO Final Meeting

Page 5: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Semantic Instrumentation• Applications:

– Word, Excel, PowerPoint, Outlook, Internet Explorer, Windows Explorer, GSView, Acrobat, Visual Studio, Thunderbird

• Application Events:– Documents: New, Change, Open, Save, Save As, Close– Email: Open, Close, Send, Reply, Forward, Attach File, Save Attachment, Open 

Attachment, Incoming Email, Click on Hyperlink– Web pages: Open, Navigate, Upload File, Download File

• OS Events:– File Create/Delete/Rename– Window Focus– Copy/Paste– Suspend/Resume/Idle

PAIR 20091/25/2011 5

Page 6: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Tracking the User’s Current Project

• Manual input from user: Project Selector• Incoming email classifier: Email Predictor• Project Switch detector: Project Predictor

PAIR 20091/25/2011 6

Page 7: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Manual Input from User

• Project Selector: Windows Task Bar:

• Pop‐up menu of recent projects: Control+tilde

PAIR 20091/25/2011 7

Page 8: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Email Predictor• Predicts most likely project of each incoming email message

PAIR 2009

Predicted project is added as a “Category” in Outlook

1/25/2011 8

Page 9: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Email Predictor:Confidence‐Weighted Classifier

• Drezde, Crammer & Pereira, ICML 2008– Linear classifier– Maintains diagonal Gaussian distribution over the weight vector

– When a mistake is made• Update the Gaussian so that with probability 1 – , a randomly sampled weight vector would not have made the mistake

• Features:– sender– set of all recipients– words in Subject:– words in body

PAIR 20091/25/2011 9

Page 10: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Email Classifier Performance

1/25/2011 PAIR 2009 10

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre

cisi

on (E

mai

ls C

orre

ct/ E

mai

ls P

redi

cted

)

Coverage (Emails Predicted / Total Emails)

CW BNB PA TWCNB TFIDF MNB

Dietterich’s email 2004 – 2008 almost 21,000 examples 381 classes ranging in size from 1 to more than 2500 messages

Page 11: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Predicting Project Switches

• Extract the following information once per minute:– Current active “resource”

• Window title• Pathname / URL

– Current declared project– Titles of all windows– Resources in all windows

PAIR 20091/25/2011 11

Page 12: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Project Predictor Features

• Project‐Specific Features  FP(y)– Strength of association of active resource with y– % of open resources associated with y– Importance of title word x to project y

• Modified TF‐IDF score

• Switch‐Specific Features  FS(y, y’)– # of resources closed in last minute– % of open resources accessed in last minute– Time since user’s last explicit switch

PAIR 20091/25/2011 12

Page 13: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Project Predictor

• Let– FP(X,y) = project‐specific features– FS(Xt,Xt+1) = switch‐specific features– = learned parameters

• Scoring function:

PAIR 2009

Xt Xt+1

yt yt+1

1/25/2011 13

Page 14: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Regularized Passive‐Aggressive Algorithm(Crammer et al., 2006; Shen, 2008)

• Add a penalty on the size of • Closed‐form solution:

PAIR 20091/25/2011 14

Page 15: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Results on My Data

PAIR 20091/25/2011 15

Page 16: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Results on Another User

PAIR 20091/25/2011 16

Page 17: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Project‐Based Services

• Task explorer• Folder predictor• Outlook search folders• Time tracking• Project‐specific notes

PAIR 20091/25/2011 17

Page 18: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Task Explorer:Supports Interruption Recovery and Information Re‐Finding

PAIR 2009

Project Hierarchy

All folders,documents, web

pages, emails, and contacts

1/25/2011 18

Page 19: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Information Re‐Finding:Folder Predictor: Shortcuts to Relevant Folders

• Maintain statistics on file opens and saves on a per‐project basis– Recency‐weighted count of saves and 

opens• When user initiates open/save 

compute 3 folders to minimize expected number of clicks to get to the desired folder

argmin{f1,f2,f3} f P(f | proj) ∙min {clicks(f1,f), 1+clicks(f2,f), 1+clicks(f3,f)}

PAIR 20091/25/2011 19

Page 20: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Average Cost to Reach Target Folder

PAIR 20091/25/2011 20

Page 21: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Provenance Instrumentation

• File‐to‐File– copy/paste– cut/paste– SaveAs– file system copy– file system 

rename• Email‐to‐File

– save attachment– add attachment

• Email‐to‐Email– reply– forward

• Web‐to‐file– download file– upload file

PAIR 2009

• Email‐to‐Web– click on hyperlink

• Web‐to‐Web– click on hyperlink

1/25/2011 21

document

email web page

upload /download

copy/paste

attach/save attachment

click hyperlink

copy/paste

copy/paste

copy/rename/SaveAs

reply/forwardclick hyperlink

Page 22: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Provenance‐Based Information Access

• Right‐click on object opens Provenance Graph– email header in Outlook– attachment in Outlook– file name in Windows Explorer

PAIR 20091/25/2011 22

Page 23: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Intel Smart Desktop Study

• 6 participants (~4 weeks each)• Beta 3 of SmartDesktop (with additional provenance instrumentation)

1/25/2011 CALO Final Meeting 23

Page 24: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Evaluating the TaskTracer Hypotheses– Hypothesis: When working on project P, users will tend 

to access only other items also associated with P– Identify all cases where user switches between two items 

A and B and classify them:

1/25/2011 CALO Final Meeting 24

AB copy/paste

B changed None

A & B in same project

Within‐Project Information Access Ignore

A & B in different projects

Cross‐Project Information Access

WeakCross‐Project Information Access

Multi‐tasking switch

Page 25: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Cross‐Project Information Access is Rare

1/25/2011 CALO Final Meeting 25

Page 26: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

TaskTracer Summary

• “Project”: very high level and very useful notion of “activity”– Interruption recovery – Information re‐finding

• Mix of user input and automated activity recognition

• Online classifiers that are responsive to user feedback

• Integration with Windows XP and Office• Open Source release this fall

1/25/2011 PAIR 2009 26

Page 27: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Outline

• TaskTracer: “Project‐Oriented User Interface”– Semantic Instrumentation– Projects & Provenance

• CALO: Activity Recognition– Workflow Discovery– Activity Recognition

PAIR 20091/25/2011 27

Page 28: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Discovering Workflows• Build an information flow graph

– Node: each visited object (files, emails, webpages)– Arcs: TaskTracer provenance links + other inferred connections– Collapse certain patterns

• Search for the frequent subgraphs in the information flow graph– Step 1: Ignore edge labels– Step 2: Dynamic program to find most frequent labeled subgraphs

• Some assumptions– A workflow is a (weakly) connected graph– A workflow involves at least k = 3 resources– A workflow involves at least one non‐email resource

PAIR 20091/25/2011 28

Page 29: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Inferred Relationships

• Document conversion– convert to PDF file– zip and unzip

• Email reference– email body refers to a document (e.g., by title)

PAIR 20091/25/2011 29

Page 30: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Collapse Certain Patterns

• File editing– Replace chain of SaveAs links with SaveAs*

• Email conversation– Replace Send/Receive/ReplyTo links with ReplyTo*

PAIR 20091/25/2011 30

Page 31: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Resulting Information Flow Graph

PAIR 20091/25/2011 31

Page 32: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Mining Frequent Subgraphs

• Construct frequent pattern candidates from the existing ones [Nijssen & Kok: KDD04]

• Restrict to “Closed” patterns: Can’t add a node or edge without shrinking the set of covered instances

PAIR 20091/25/2011 32

Page 33: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Assigning Action Type Labels to Each Edge

• Searching For Frequent Action Paths– Choose action types for each edge in the discovered subgraphs

– Simply selecting the most frequent action for each edge does not work

– Efficient dynamic programming algorithm

PAIR 20091/25/2011 33

Page 34: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

AttachSaveAs*SaveAttachment

Example Discovered Workflow

msg1 msg2doc1 doc2

ReplyTo*

PAIR 20091/25/2011 34

• CALO experiment• four participants• preparing and reviewing papers for a conference• filing travel requisitions and reimbursements• web browsing, etc.

• Five workflows discovered

Page 35: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Results on CALO Data

0.40.50.60.70.80.9

1

1 2 3 4 5 6 7

The support threshold

F-1S

core

PAIR 20091/25/2011 35

Recall: the number of correctly discovered resources and actions divided by the number of real resources and actionsPrecision: the number of correctly discovered resources and actions divided by the number of predicted resources and actions F1: 2*precision*recall/(precision+recall)

Page 36: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Real‐Time Recognition of Workflows

• Problem: recognize current user’s workflow(s), track their progress and their parameters

• Challenge: – Workflow instances vary (optional steps, non‐deterministic branching and ordering)

– Workflows have parameters (e.g. filename)– Huge number of objects (emails, files, contacts)– Workflow steps need to be separated from background activities

– Multiple workflows need to be separated from each other

PAIR 20091/25/2011 36

Page 37: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Tracking a Single Workflow:Logical Hidden Markov Model

P Q R

PAIR 20091/25/2011 37

Page 38: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Logical Hidden Markov Model

• Transition probabilities represented as logical probabilistic rules

P(X,Y) Q(Y,Z) R(X,Z)

8x ; y; z P (x ; y) ! Q(y; z); w = 0:5=jzj

PAIR 20091/25/2011 38

Page 39: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Logical Hidden Markov Model

• Transition probabilities represented as logical probabilistic rules

P(X,Y) Q(Y,Z) R(X,Z)

8x ; y; z

Ground HMM

P (a; a) ! Q(a; c); w = 0:5=jzjP (a; b) ! Q(b; c); w = 0:5=jzjP (b; a) ! Q(a; c); w = 0:5=jzjP (b; b) ! Q(b; c); w = 0:5=jzj

: : :

P (x ; y) ! Q(y; z); w = 0:5=jzj

PAIR 20091/25/2011 39

Page 40: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Inference in Logical HMMs

• Naïve method– Convert to ground HMM– Complexity O(S2)

• S = (# objects)arity

– 1000 objects, arity=3• S2 = 1018

– Hopeless!

• Kersting et al 2006– Incremental construction of ground 

HMM– Only generate the set of ground 

states feasible/reachable given the starting state and the observations

– Problematic when too many feasible states remain

P(X,Y) Q(Y,Z) R(X,Z)

PAIR 20091/25/2011 40

Page 41: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Lifted Inference for Logical HMMs

Time t

Move forward Absorb observation Normalize

• Three main operators– Matrix multiplication– Hadamard product– Normalization (L1‐norm)

• These can be implemented for LHMMs without grounding

Time t+1

Belief state

A L  (A)

L = B[ot+1]

L  (A)

kL  (A) k1

PAIR 20091/25/2011 41

Page 42: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Results on Randomly Generated Models

Lifted inference almost independent of domain size Lifted inference is linear in # variables in predicate

Ground inference via HMM is hopeless

PAIR 20091/25/2011 42

Page 43: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Tracking Multiple, Interleaved Workflows:Lifted Rao‐Blackwellized Particle Filter

• Sample only one variable (which workflow is active)– All the other random variables (workflow states, workflow 

parameters) are marginalized out via lifted inference 

Is this a red, green,or a new workflow?

Sample using Particle Filtere.g., sample result  = green

Split into 2 separate modelsApply lifted inference on each model

Logical HMM 1 Logical HMM 2

PAIR 20091/25/2011 43

Ct ~ P(Ct|c1:t‐1, y1:t)

Page 44: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Experimental Test with Learned Workflows

• Leave one user out• At each time t, 

– WARP makes prediction– We score prediction for correctness– We provide corrective feedback

• Results– Precision:  91.3%– Recall:  66.7%– F1: 77.1%

• Major bug: poor accuracy when recognizing the first step in a new workflow

1/25/2011 PAIR 2009 44

Page 45: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Open Problems

• Blackbox Semantic Instrumentation– Can we infer semantic events from easily‐observed events (system calls, file system operations, window manager operations)?

• Deeper Semantic Analysis of Email and Documents– Recognize speech acts, discussions, decisions

• Detect Start of New Workflow– May require multiple events to detect (“smoothing” instead of “filtering”)

• Usable Integration with TaskTracer

PAIR 20091/25/2011 45

Page 46: Activity Recognition in TaskTracer and CALOtgd/talks/ijcai-tasktracer-2009.pdf · Activity Recognition in TaskTracer and CALO Tom Dietterich, Victoria Keiser, Jianqiang Shen School

Acknowledgements

• Funding: – NSF MKIDS program grant IIS‐0133994– DARPA PAL program

• This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750‐07‐D‐0185/0004.  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the DARPA, or the Air Force Research Laboratory (AFRL).

– Intel R&D Council

1/25/2011 PAIR 2009 46