Upload
yandex
View
653
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mouse cursor movements can provide valuable information on how users interact and engage with web documents. This interaction data is far richer than traditional click data, and can be used to improve evaluation and presentation of web information systems. Unfortunately, the diversity and complexity inherent in this interaction data make it more difficult to capture salient behavior characteristics through traditional feature engineering. To address this problem, we introduce a novel approach of automatically discovering frequent subsequences, or motifs, in mouse cursor movement data. In order to scale our approach to realistic datasets, we introduce novel optimizations for motif discovery, specifically designed for mining cursor movement data. We show that by encoding the motifs discovered from thousands of real web search sessions as features, enables significant improvements in important web search tasks. These results, complemented with visualization and qualitative analysis, demonstrate that our approach is able to automatically capture key characteristics of mouse cursor movement behavior, providing a valuable new tool for online user behavior analysis. In addition to the application of motifs to web mining, we demonstrate that similar technique can be successfully applied in medical domain for the task of predicting future decline of memory function and subsequent development of the Alzheimer Disease.
Citation preview
1
Discovering Common Motifs in Cursor Movement Data
Dmitry Lagun, 2014Emory University
2
Thank you!
Mikhail Ageev Qi Guo Eugene Agichtein
3
The Importance of Online User Attention
• “Attention is focused mental engagement on a particular item of information.”(Davenport & Beck 2001, p. 20)
Abundance of information
Scarcity of attention
4
The Importance of Online User Attention
• “Eye-mind Hypothesis”[Just and Carpenter, 1980]
• “When a subject looks at a word or object, he or she also thinks about (process cognitively), and for exactly as long as the recorded fixation.”
5
The Importance of Online User Attention
• Attention is critical for science of cognition (vision, language, memory)
• Many industry applications:– Web search intent, quality,
presentation, satisfaction– UI usability testing– Display advertising,
customer engagement, branding
6
Measurement of Attention
• Eye Tracking– Based on corneal reflection of infra-red light
Infra-red cameras
Users spend most of the time on top search results
7
Applications
Examination Strategies [Buscher et al.]
Web Page Re-Design [Leiva et al.]
Behavior Biased Summaries
[Ageev et al.]
Query-Expansion & Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD[Tseng et al.]
Prediction of Cognitive Impairment[Zola et al.]
Search Relevance [Guo & Agichtein]
Search Abandonment[Huang et al.]
8
Applications
Examination Strategies [Buscher et al.]
Web Page Re-Design [Leiva et al.]
Behavior Biased Summaries
[Ageev et al.]
Query-Expansion & Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD[Tseng et al.]
Prediction of Cognitive Impairment[Zola et al.]
Search Relevance [Guo & Agichtein]
Search Abandonment[Huang et al.]
Our focus
9
emory math and cs
Search
10
Search Logs
Web Pages
Search Engine Ranking
emory math and cs
emory math and cs
emory math and cs
11
Search Logs
Web Pages
Search Engine Ranking
click
emory math and cs
emory math and cs
emory math and cs
12
Search Logs
Web Pages
Search Engine Ranking
Relevant or Not?
Ranking
emory math and cs
emory math and cs
emory math and cs
13
Prior Work:Cursor Movement on Landing Pages
• Post Click Behavior Model [Guo and Agichtein, WWW 2012]• Two basic patterns: “Reading” and “Scanning”
Reading Scanning
“Reading”: consuming or verifying when (seemingly) relevant information is found
“Scanning”: not yet found the relevant information, still in the process of visually searching
14
Post-Click Behavior (PCB) Data Improves Ranking
• PCB and PCB_User consistently outperform DTR (baseline)
[Guo & Agichtein, WWW 2012][Guo , Lagun & Agichtein, CIKM 2012]
DTR = Dwell time + Rank
ND
CG
15
Post-Click Behavior (PCB) Model Features
• Average cursor position, cursor speed, direction
• Travelled distance, horizontal and vertical ranges
• Max/Min cursor positions on the screen• Scroll speed, frequency and scroll distance• Cursor position in a region-of-interest
Can we automatically discover meaningful features of cursor trajectory?
16
Our Approach: Cursor Motif Mining Instead of engineering complex features, discover common subsequences (motifs)
Motif is a frequently occurring sequence of cursor movements.
Similar
17
Mouse Cursor Data: Challenges
Different users examine web pages with different speed, hence move mouse slower/faster.
Similar of movements can appear in different parts of a web page (top vs. bottom).
18
Mouse Cursor Data: Challenges
Different users examine web pages with different speed, hence move mouse slower/faster.[Flexible Distance Metric, DTW]
Similar type of movements can appear in different parts of a web page (top vs. bottom).[Location Invariance: normalize subsequence position]
19
Motif Discovery Pipeline
Generate Motif Candidates
Discover Frequent
Candidates
De-duplicate / Output Motifs
Distance Measure
20
Candidate Generation
window size
sliding window
Motif candidates
21
Distance Measure
• Which time series are similar? • Popular Choices:
– Euclidian Distance (ED)– Dynamic Time Warping (DTW)
22
Frequent Motif Mining
• Similarity Search– How many subsequences in the dataset are similar
to the given candidate subsequence?motif candidates
moti
f can
dida
tes
dist(i,j) – how similar i-th candidate to the j-th motif candidate.
Algorithm Parameters:max_dist – distance when two subsequences are considered “similar”min_count – minimal frequency of motif candidate
Brute force search is computationally expensive
23
De-Duplication (only keep cluster centroids)
• Similarity search can generate a lot of frequent candidates that are similar between each other (due to redundancy in motif candidate generation)
24
Motif Discovery Pipeline
Generate Motif Candidates
Discover Frequent
Candidates
De-duplicate / Output Motifs
Distance Metric
25
Optimizations in Similarity Search
• Early stopping– in DTW computation (takes O(n^2) time)– in lower bound computation (takes O(n) time)
[Keogh et al.]• Parallel Computation
– No dependency in distance computation use multiple cores
• Distance Metric Learning• Spatial Indexing
26
Distance Measure Learning
• Goal: Fast pruning of not-promising candidates in similarity search
Features (x_max, y_max, …, feature_k)
Features (x_max, y_max, …, feature_k)
Tune the weights with Gradient based method (e.g. SGD)
27
Spatial Indexing
• Goal: Fast pruning of not-promising candidates in similarity search
• Indexes motif candidatesin weighted feature space
• Improves asymptotic time for similarity search
28
Timing Experiments
29
Example of Discovered Motif
discovered motif
eye gaze
mouse cursor
matching subsequence
30
Motifs Discovery: Examples
On Search Engine Result Pages (SERPs)
On “Landing” Pages (non-SERPs)
31
Discovered motifs have many uses
• Summarize typical mouse cursor usages– E.g. create dictionary of typical cursor usages
• Compact (task-free) representation– Characterize entire cursor trajectory based on which
motifs appear in it
• For classification/regression:– Compute whether particular motifs appears in a
given mouse cursor trajectory
32
Using motifs as features for Classification/Regression
• We can measure how similar is mouse movement trajectory to each of the discovered motifs
window size
sliding windowmotif
33
Motifs for Relevance Prediction
• Baselines– Cursor Hover (on the search result page)
[Huang et al., CHI 2011]
– Post Click Behavior Model[Guo & Agichtein, WWW, 2012]
• Dwell time• Statistics of cursor movements: max, min, range, etc.• Statistics of scrolling activity: max, min, range, etc.
Reading Scanning
34
Dataset
• User study (21 users)– mostly informational search tasks
– 566 search queries
– 1340 page views
– 854 relevance judgments
35
Motifs are Better than Previous Models (PCB, Hover)
Feature Group Pearson CorrelationCursor Hover 0.120Post Click Behavior 0.392Motifs 0.394 (+0.5%)Post Click Behavior + Motifs 0.468 (+19.4%)
36
Motifs are Helpful for Web Search Result Ranking
37
Conclusions
• It is possible to automatically discover meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction & ranking
• Cursor motifs provide compact (task free) representation for the entire cursor trajectory
38
Applications of Gaze/Mouse Cursor Tracking in Medical Domain
39
Background: Mild Cognitive Impairment (MCI) and Alzheimer’s Disease
• Alzheimer’s disease (AD) affects more than 5M Americans, expected to grow in the coming decade
• Memory impairment (aMCI) indicates onset of AD (affects hippocampus first)
• Visual Paired Comparison (VPC) task: promising for early diagnosis of both MCI and AD before it is detectableby other means
40
VPC Task: Eye Tracking Equipment
41
Impaired Subjects spent 50% on Novel Image after Long Delay
42
VPC Task: Eye Tracking
43
Exploiting Eye Gaze Movement Data
Novelty Preference
fixation duration distribution
+
44
Shapelets are Helpful for Prediction of Cognitive Decline
• Shapelets – “class specific” motifs
45
Shapelets are Helpful for Prediction of Cognitive Decline
• Shapelets – “class specific” motifs
Baseline AUC = 0.892 ± 0.003Shapelets AUC = 0.916 ± 0.006
46
User Attention on Web Pages
47
Cross-Domain User Study
• Research Question– Does web page content affect user attention?
• Domains– Search (Google), Wikipedia, Shopping (Amazon), Social (Twitter),
News (CNN )
• 20 users (4 + 20 tasks per user)
• 400 tasks, 1700 page views
• 500K gaze/cursor measurements (sampled every 50 ms)
?search domain X
48
Web Search Pages
49
News Search Pages
50
Shopping Search Pages
51
Twitter Search Pages
52
Conclusions
• It is possible to automatically discover meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction, ranking and prediction of cognitive impairment
• Attention patterns vary significantly across search interfaces
53
Thank You!
• This work was supported by
54
Emory IR Lab: Research Areas
• Modeling collaborative content creation for information organization, indexing, and search
• Mining search behavior data to improve information finding.
• Medical applications of Search, NLP, behavior modeling.
55
UFindIt: Remote Search Behavior StudiesMisha Ageev (MGU & Yandex), Dmitry Lagun (Emory), Denis Savenkov (Emory)
SIGIR 2011 (best paper award), SIGIR 2013, EMNLP 2013
56
Search behavior models for Touch Screens
Ongoing project, looking for students
Guo et al., SIGIR 2013
Dynamics in User Generated Content
Wikipedia
Major events (e.g., natural disasters, sports) affect the content change in Wikipedia articles.
Use content change for ranking:• Words used in early revisions of the documents are more essential and important to
the documents.• Words used during a major event may reflect relevance change between words and
documents
Topic transitions in Tweet streams:• What you’ve tweeted before may affect what you will tweet in the near feature.
Sentiment change in Twitter during major events:• People respond differently to the same event since they could hold different prior
opinions. (e.g., conservatives vs. liberals)
Yu Wang (Ph.D. expected 2014)[CIKM 2010, KDD 2012, CIKM 2013]
Community Question Answering (CQA)
1. What are the factors influencing answer contributions in CQA Systems?– Analyzing answerer behavior [ECIR 2011]
2. What kind of searches benefit most from CQA services and archives? – Understanding how searchers become askers [SIGIR 2011]
3. How to improve search quality with CQA data?– Predicting searcher satisfaction with CQA data [SIGIR 2012]
Qiaoling Liu, Ph.D. expected: 2014
59
• Emory IR Lab is looking for a few good Ph.D. students to start September 2015
• Information retrieval and web search: search behavior, ranking, user interfaces, content analysis, Question Answering
• Social media and social network mining applications:political science, public health, advertising
• Psychology, Neuroscience, Medicine applications: computational attention, memory, cognition, language
Contact: Eugene AgichteinAssociate Professor
[email protected]/~eugene/
http://www.mathcs.emory.edu/programs-grad/ Computer Science Ph.D. Program information and application process:
60
Atlanta, GA