Lessons from the Journey: A Query Log Analysis of Within-Session Learning (WSDM'14)

Embed Size (px)

DESCRIPTION

The Internet is the largest source of information in the world. Search engines help people navigate the huge space of available data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engine. We investigate within-session and cross-session developments of expertise, focusing on how the language and search behavior of a user on a topic evolves over time. In this way, we identify those sessions and page visits that appear to significantly boost the learning process. Our experiments demonstrate a strong connection between clicks and several metrics related to expertise. Based on models of the user and their specific context, we present a method capable of automatically predicting, with good accuracy, which clicks will lead to enhanced learning. Our findings provide insight into how search engines might better help users learn as they search. This work together with Jaime Teevan, Ryen White and Susan Dumais has been accepted for full oral presentation at the 7th ACM International Conference on Web Search and Data Mining (WSDM). The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2556195.2556217

Citation preview

  • 1. Lessons from the Journey A Query-log Analysis of Within-session LearningCarsten Eickhoff Jaime Teevan Ryen White Susan Dumais

2. Learning by Searching Domain expertise seems to be generally useful for indomain searches Domain expertise can slowly change over time Here, we measure this effect at finer granularity 3. ExpertiseStudying Expertise over TimeTime 4. Explicit Learning Sessions Learning happens all the time We look at explicit knowledge acquisition sessions Two types of informational needs: Procedural: learn how to do something E.g.: Ehow.com, YouTube tutorials, Declarative: learn about something E.g.: Wikipedia.com, documentaries, 5. Finding Indicator Terms Group sessions that end at Ehow vs. Wikipedia Find query terms that occur more frequently in knowledge acquisition sessions 6. Selecting Sessions Based on a set of 26.7 Million sessions Select sessions that contain indicator terms in at least 50% of queries Dproc Ddecl 7. Session Properties Knowledge acquisition sessions are long, topically diverse and more exploratory 8. Session Properties Knowledge acquisition sessions are long, topically diverse and more exploratory 9. Session Properties Knowledge acquisition sessions are long, topically diverse and more exploratory 10. Session Properties Knowledge acquisition sessions are long, topically diverse and more exploratory Extended sets are noisy and mimic the full collection 11. Within-session Learning General upwards trend for domain count and query complexity This trend is strongest for learning sessions 12. Sustained Learning What happens beyond the session boundary? Domain expertise metrics are more likely to increase further after within-session learning 13. Page Visits Spark Learning We study the origin of new query terms (Where) did added terms occur previously in the same session? 14. The Effect of Page Visits Condition P+, P= and P- of expertise metrics on click status of previous SERP Clicks more often result in metric increases Click duration has no significant effect 15. Summary Introduced procedural/declarative needs Noted evidence of within-session learning Learning is sustained across session boundary Page visits seem to have a strong influence 16. Future Directions Ranking to Learn Learning potential is spread evenly across SERP Predictors of learning potential may serve as ranking criteria Qualitative study of query reformulation Here: Term presence implies causality Better: Study what the user really sees (e.g., via eye gaze tracking) 17. Thank You.