Click here to load reader

Ire project - Entity Linking in Social Media

Download PDF Report

Upload
abhishek-mittal
View
369
Download
0

Embed Size (px)

DESCRIPTION

Abstract:- Now-a-days, most of our conversations happens through social media. It has become a new medium of sharing information and hence has emerged the need to understand the meaning of the information shared through social media. Therefore to understand the context of the conversation, having the semantic understanding of a sentence is very important. This is possible by linking named entities to their context. So the main aim of our project is to Link Entities in Social Media, i.e. extracting context and meaning of a sentence from a tweet and linking it to a wikipedia page for better understanding of the context. So we have taken tweets as our base to evaluate and test our method of entity linking by extracting context from tweets. Introduction:- Social media has disrupted the personal and commercial habits of people to a degree not seen since the early days of television. Just as television turned a nation of people who listened to media content into watchers of media content, the emergence of social media has created a nation of media content creators. According to 2011 Pew Research data, nearly 80% of American adults are online and nearly 60% of them use social networking sites.[1] More Americans get their news, such as it is, via the Internet than from newspapers or radio, as well of three-fourths who say they hear of news from e-mail or social media sites updates, according to a new report published by CNN. The survey suggests that Facebook and Twitter make news a more participatory experience than before. Therefore, there is great deal of information that gets transmitted around through social media which has aroused the need to understand the context hidden in the unstructured pieces of information shared via means of social interactions and communications. So the main idea is to find the context hidden in the tweets, in order to have a better understanding of the tweets and better categorization of information which could provide us with deeper insights of information sharing and communications. To achieve this Named entities are recognised in the tweets. Then these entities are matched to their corresponding wikipedia pages and the context of tweet is obtained from these wikipedia pages based on which context gets the maximum score.

Citation preview

Entity Linking in Social Media

Project Number : 10Group Number : 51

- Abhishek Mittal, 201101192- Mohit Aggarwal, 201101164

- Vishrut Mehta, 201102128- Himanshu Ghadiya, 201305620

Page 2: Ire project - Entity Linking in Social Media

Overview● The main aim of our project is to Link Entities in Social Media,

i.e. extracting context and meaning of a sentence from a tweet and linking it to a wikipedia page for more understanding.

● In today’s world, Semantic understanding of Sentence is very important. Now, most of our conversation happens through social media, and its important to understand the meaning of those conversations, which is possible by linking named entities to their context. So we have taken tweets as our base to evaluate and test our method of entity linking by extracting context from tweets.

Page 3: Ire project - Entity Linking in Social Media

Approach

● We extract named entities from the tweets using the CMU-ARK tagger .

● The named entities are then mapped to relevant news feeds in a particular time interval.

● We then extract the named entities from these news feeds and obtain a final collection of related entities that would contain sufficient information about that tweet.

● Corresponding to each entity, we find the Wikipedia Pages. ● We then find the labels for each wiki page in order to find their

context and then finally map the tweets to that context. The classification task is done using SVM.

Page 4: Ire project - Entity Linking in Social Media

Design

Page 5: Ire project - Entity Linking in Social Media

Datasets

We have used the following datasets for the project -● A dataset of tweets.● A dataset of news feeds from different news websites. We have used the

CBS News dataset.● A 40 GB Wikipedia Dump as the training set for SVM. Right now, we

have trained the SVM on only 5 GB of Wikipedia data.● A predefined set of about 15 labels, that the tweets would be mapped to.

Page 6: Ire project - Entity Linking in Social Media

Tools

We have used the following tools for the project -● CMR-ARK parser - To find named entities using mention detection from

tweets.● Stanford parser - To find named entities using mention detection on news

feeds (as they are structured).● Wikipedia Search API - To find wiki pages for a keyword.● SVM - Libsvm - To find the context of the wiki page.

Page 7: Ire project - Entity Linking in Social Media

Results

● We evaluated our system on a small dataset. We took about 200 tweets dated 10 January, 2014 and news feeds during all the 24 hours of that day.

● We then ran our algorithm to find the context of each tweet. After comparing the results with the labels we had manually assigned, we found the Accuracy to be around 37 percent.

● The low accuracy in outputs is mostly because of small training and testing datasets used for classification. When we train the SVM on 40 GB of Wikipedia dataset, we are confident of achieving a good accuracy.

Page 8: Ire project - Entity Linking in Social Media

Challenges and Issues

● Feature Selection for SVM was a major challenge. We would have to choose such feature vectors that would give maximum accuracy during classification.

● Training SVM on 40GB of Wikipedia is a major challenge.● Right now, we have taken only 15 labels for classifying the tweets.

Increasing the number of labels would make the algorithm more computation intensive. Scaling this system for bigger datasets and more contexts would require more optimizations.