TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science

Embed Size (px)

DESCRIPTION

Introduction  Motivations – Users find news through search engines –The search results of common search engines are different from the user expected  Non-critical information  Unorganized content –Necessary for search engines to understand the intend of the user query 3/26

Citation preview

TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR 10 29 April, 2011 Sengyu Rim Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 2/26 Introduction Motivations Users find news through search engines The search results of common search engines are different from the user expected Non-critical information Unorganized content Necessary for search engines to understand the intend of the user query 3/26 Introduction Motivation E.g what event in Korea attracted most attention in 2002? A naive user is searching the news with keyword korea on Map: korea Wiki: Korea News: Korea:Italy 2:1 Food: Kimchi 4/26 Introduction Analyze the content of a popular social networking site, Twitter to know the intention of the user query Twitter provides popular news topics Twitter provides keywords that may enhance the user query TWinner makes two novel contributions to the field of Geographic information retrieval Identifying the intent of the user query Adding additional keywords to the query 5/26 Introduction The architecture of the news intent system Twinner 6/26 Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 7/26 Related Work To identify and disambiguate the locations of users Natural Language Processing Data Mining To establish the relationship between the location of the news and news content A model using NLP techniques 8/26 Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 9/26 Twitter as News-wire Twitter Free social networking Micro-blogging service Medium for news updates 10/26 Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 11/26 Determining News Intent Identification of Location Geo-tags the query to a location with certain confidence Frequency-Population Ratio FPR always remains constant in the absence of a news making event irrespective of the location Used to assign a news intent confidence to the query FPR = ( + ) * Nt : the population density factor : location type constant Nt:the number of tweets per minute at that instant 12/26 Determining News Intent Experiments on determining the effect of geo-type and population density 13/26 Determining News Intent The drawback of FPR Fails to take into account the geographical relatedness of features Modified FPR FPR = i ( i + i ) * Nt i: factor that each geo-location related to the primary search query 14/26 Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 15/26 Assigning Weights to Tweets Detecting Spam Messages Spam messages carry little or no relevant information Nature of spam messages The formula that tags to a certain level of confidence whether the message is spam or not Np: the number of followers Nq: the number of people the user is following : an arbitrary constant Nr: the ratio of number of tweets containing a reply to the total number of tweets 16/26 Assigning Weights to Tweets On basis of user location The experiment conducted to understand the relation between Twitter messages and the location of the user 17/26 Assigning Weights to Tweets Using Hyperlinks Mentioned in Tweets 30-50% of the general Twitter messages contain a hyperlink to external website The news Twitter messages of this percentage increases to 70-80% We also make use of this pointer to assign the weights to tweets 18/26 Assigning Weights to Tweets Semantic Similarity Summarize the Twitter messages into a couple of keywords Nave approach picks k keywords ignoring the sematic similarity The definition of the semantic similarity M: the total number of articles searched in New York Times Corpus f(x): the number of articles for term x f(y): the number of articles for term y 19/26 Assigning Weights to Tweets Reassigns the weight of all keywords on the basis of the following formula Wi*= Wi + S ij * W j Wi*: the new weight of the keyword i Wi: the weight without semantic similarity S ij : the semantic similarity derived from semantic formula W j : the initial weight of the other words being considered Identifies k keywords that are semantically dissimilar but together contribute maximum weight. S pq