9
Resources! Resources! Resources! Heiko Ehrig (Head of Research)

Heiko Ehrig: Resources! Resources! Resources!

Embed Size (px)

DESCRIPTION

Heiko Ehrig (Neofonie) introduced the company shortly. They shifted from developing search engines web and mobile application development and consulting, including interaction design, testing and data analytics. Neofonie developed a German text mining API that performs classification, keyword detection, entity detection, date detection, NER, and quotes (API key http://bit.ly/txtwerk). From their experience with NLP and linked data they point to the examination of the following issues: -extension of entity types -building more individual customer lexica and sentiment detection. -broaden LD and NLP for more languages than English. -development of a gold standard of German (N)ER. - discussion of standardized text mining API. -support of open data and open licenses.

Citation preview

Page 1: Heiko Ehrig: Resources! Resources! Resources!

Resources! Resources! Resources!

Heiko Ehrig (Head of Research)

Page 2: Heiko Ehrig: Resources! Resources! Resources!

2

Berlin, 1998, 1st german search engine, 180 pl, 2 companys

Page 3: Heiko Ehrig: Resources! Resources! Resources!

3

What we offer

3

Page 4: Heiko Ehrig: Resources! Resources! Resources!

4

Page 5: Heiko Ehrig: Resources! Resources! Resources!

✱ 12 Computer Scientists, Linguists, Mathematicians

✱ Text Mining and Analytics, Search ✱  Text Classification ✱  Named Entities and Concept Tagging ✱  Topic Detection and Tracking ✱  Sentiment Analysis (Customer‘s Voice)

✱ Data Analytics & Consulting

✱  Individual Projects

5

Research Department

Page 6: Heiko Ehrig: Resources! Resources! Resources!

✱ Works On German Texts ✱  Department Classification ✱  Keyword Detection ✱  Dates Detection ✱  Entity Detection (person, location, organisation) ✱  Concepts with Links to Freebase ✱  Named Entities with Links to Freebase ✱  Quotes

✱ Get Your API Key : http://bit.ly/txtwerk

6

txt werk - a Textmining API

Page 7: Heiko Ehrig: Resources! Resources! Resources!

✱ German Resources are rare!

✱ Example Named Entity Linking ✱  We did not find a Gold Standard ✱  Manual Labeling

✱ ERD Challenge 2014 (SIGIR'14 workshop) ✱  Googlers manually reannotated some hundred texts

from ClueWeb (data set not public)

7

Resources! Resources! Resources!

Page 8: Heiko Ehrig: Resources! Resources! Resources!

✱ More Entity Types (companies, products, brands)

✱  Individual Customer Lexica

✱ Sentiment Detection

✱ English and more languages

8

Roadmap

Page 9: Heiko Ehrig: Resources! Resources! Resources!

✱ Share your resources!

✱ Corporate-friendly licensing!

✱  If you leave the academia, share your resources!

✱ Lobby for resources @EC!

✱ Lobby for maintaining resources servers (like meta-share, datahub.io)

✱ Don‘t forget the Non-English Speaking World!

9

Wishes to the community