Conceptual text mining Pim Huijnen Utrecht University & University of Sheffield Digital Humanities Workshop May 12, 2016

Conceptual Text Mining

Download PDF Report

Upload
pim-huijnen
View
54
Download
0

Embed Size (px)

Citation preview

Conceptual text mining

Pim Huijnen

Utrecht University & University of Sheffield Digital Humanities Workshop

May 12, 2016

What to do with 11 million newspaper pages?

a) distant reading

In: Het Centrum, 10 October 1919, p. 4.

b) finding the needle in the hay stack

In: Het Volk: Dagblad voor de arbeiderspartij, 29 January 1921, p. 3.

1 How to define a concept? Efficiency ≠ “efficiency”

Eugenetica ≠ “eugenetica” + “eugenetiek” + “eugeniek" + "rassenleer"

2 How to study its changing uses, contexts, and meaning over time?

How to know what to look at?

1 How to define a concept? Efficiency ≠ “efficiency”

Eugenetica ≠ “eugenetica” + “eugenetiek” + “eugeniek" + "rassenleer"

2 How to study its changing uses, contexts, and meaning over time?

How to know what to look at?

1) Eugenics

* topic modeling newspaper articles containing "eugenics" * using meaningful words to look for eugenics without

“eugenics” * in the given example: querying Texcavator with

‘regulation AND health AND race’ (575 results)

Texcavator

plotting the results on a time scale (relative to total number of articles per year)

extracting distinctive words from query results per year (tf-idf)

Texcavator

Texcavator

Texcavator

2) Scientific management

* using close reading to find all significant Dutch equivalents for “scientific management"

* extract results, divide them per year and upload them to Voyant Tools

* study changing vocabulary in the subset over time

Scientific management query

”wetenschappelijke bedrijfsleiding” (233)”wetenschappelijke bedrijfsorganisatie” (216)”wetenschappelijke bedrijfsvoering” (32)”scientific management” (28)

’taylorstelsel OR taylor-stelsel’ (330)’taylorsysteem OR taylor-systeem’ (369)’taylorisme’ (42)

Combined in a single query results in 1175 hits

The third way: distributional semantics

* Our implementation combines a) creating dictionaries and b) tracing meaning over time in a single workflow

* by finding ‘most similar words’ (i.e. words with equal vector values / words with similar meaning in sentences)

* Use cluster of most similar words from ten-year time period to find most similar words in next (and partly overlapping) time frame

* Trace word use of concepts over time without being dependant on single terms or predefined dictionaries

Shico

Data Mining 1 Mining...ถ้าไม่มีโครงสร้างจะเป็น text-mining, web-mining, image-mining

Documents

Historical Text Mining Historical Text Mining, and Historical Text Mining…ucrel.lancs.ac.uk/events/htm06/RobSandersonHTM06.pdf · · 2006-08-30Historical Text Mining Historical

Documents

A Brief Survey of Text Mining · Text Mining = Text Data Mining. Text mining can be also deﬁned — similar to data mining — as the application of algorithms and methods from

Documents

Mining Text Using Keyword Distributions - Hebrew …pluto.huji.ac.il/~rfeldman/papers/feldmanHirsh.pdfKeywords: data mining, text mining, text categorization, distribution comparison,

Documents

Text Mining - Data Mining

Lifestyle

Text mining

Documents

Web Mining & Text Mining

Data & Analytics

Text mining and data mining

Technology

Text mining & Web mining

Documents

Introduction to Text Mining · Introduction to Text Mining Part V: Text Mining using Grammars Henning Wachsmuth Text Mining V Text Mining using Grammars ©Wachsmuth 2018 1

Documents

CONCEPTUAL FOUNDATIONS OF TEXT MINING AND PREPROCESSING STEPS WEEK 2 INTRODUCTION

Documents

Introduction to Text Mining and SAS Text Minersupport.sas.com/publishing/pubcat/chaps/59410.pdf · Introduction to Text Mining and SAS Text Miner Tips for Text Mining 3 The Text Mining

Documents

Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Documents

Introduction to Text Mining - uni-paderborn.de...Introduction to Text Mining Part VII: Text Mining using Clustering Henning Wachsmuth Text Mining VII Text Mining using Clustering ©Wachsmuth

Documents

Introduction to Text Mining - uni-paderborn.de · Introduction to Text Mining Part VII: Text Mining using Similarities and Clustering Henning Wachsmuth Text Mining VII Text Mining

Documents

Text Mining Medline - Oracle€¦ · Data MiningData mining Text Mining Spectrum Data Mining Chemical/sequence Data Model. Title: Text Mining Medline Author: user Created Date: 7/27/2004

Documents

CSE 634 – Data Mining: Text Mining · Text Mining vs. • Data Mining – In Text Mining, patterns are extracted from natural language text rather than databases. • Web Mining

Documents

Text mining

Technology

DATA MINING: A CONCEPTUAL OVERVIEW

Documents

Text Mining Text Classification Text ClusteringText Mining Text Classification Text Clustering 2004. 11

Documents

Data Mining Using Conceptual Clustering

Documents

Text Mining for Clementine Improve Insights with Text Mining

Documents

Text Mining with Oracle - Text Mining Summit

Documents

Text Mining Infrastructure in R - University of Idahostevel/517/Text Mining Infrastructure in R.pdf4 Text Mining Infrastructure in R an established text mining framework with architecture

Documents

Information Retrieval & Text Mining - Intranet DEIBhome.deib.polimi.it/.../DMTM/DMTM1112_TextMining.pdf · 2012-06-13 · Information Retrieval & Text Mining Data Mining and Text

Documents

Introduction to Text Mining - EDBT 2006 · Text Mining Text Mining (Def. Wikipedia) Text mining, also known as intelligent text analysis, text data mining or knowledge-discovery in

Documents

Annotation of Conceptual Co-reference and Text Mining the

Documents

Solving Some Text Mining Problems with Conceptual Graphs

Documents

Mining Unstructured Data (Text Data Mining) - Chapters Site IIA Nov5...Mining Unstructured Data (Text Data Mining) ... Text Mining tools and tips for beginning to use text ... free

Documents

Introduction to Text Mining - en.cs.uni-paderborn.de · Introduction to Text Mining Part VIII: Text Mining using Classiﬁcation and Regression Henning Wachsmuth Text Mining VIII

Documents