NLP and Text Mining: an Introduction · 2012. 6. 25. · NLP and Text Mining: an Introduction...

NLP and Text Mining: an Introduction

Matteo Romanello (DAI/KCL)

Histore Workshop – IHR – June 21, 2012

Introduction

Basic Concepts

Section 1

Introduction

I BA Classics (Greek Literature and Philology)I MA Digital Humanities (Univ. of Venice)

I e-journals in Classics

I Currently:I PhD in Digital Humanities, King’s College London

I information extraction from secondary sources

I Research Associate at German Archeological Institute (Berlin)I Digital Infrastructure for Research in the Arts and Humanities

(DARIAH)

What and Why?

Digging into Data Challenge

http://criminalintent.org/

NLP Methods

< 1990s

I rely heavily on hand-coded rulesI extract named entities with regexps

I grammars, parsing, etc.

I top down

I hardly scalable

>= 1990s

I emphasis on statistical based approach

I machine learning

I bottom up

I scalable

NLP in DH

I increasing need for mediation of NLP knowledgeI adoption and appropriation of technology need

I understanding of technologyI familiarising with

I JargonI to code or not to code?I basic concepts

I understanding a fieldI evolving quicklyI with a growing body of literatureI highly specialised

(some of the main) NLP TasksSpeech Processing

I Machine Translation

I Speech Synthesis

Information Extraction

I Named Entity ExtractionI Named Entity [Classification | Resolution]

I Relationship Extraction

I Co-reference Resolution

Text Classification

I Sentiment Analysis

I Topic Modelling

My playlist of NLP frameworks

I Voyeur/Voyant tools [web-based]I reading, text analysisI text visualisation

I Natural Language Toolkit [Python]

I General Architecture for Text Engineering (Uni Sheffield)[Java]

I LingPipe [Java]

I OpenNLP (Apache foundation) [Java]

Challenges for NLP in DH

I tools not always work straight out of the boxI issues with

I character encoding (despite Unicode)I output of OCRon historical documentsI normalisation and pre-processing

I lack of ad-hoc resourcesI datasets for training, testing, evaluationI dictionaries and gazetteersI previous results for comparison

Section 2

Basic Concepts

Machine Learning

Supervised

I model is learned fromtraining data

Models

I Hidden Markov Model

I Support Vector Machine

I Conditional RandomFields

Applications

I sequence labelling

Unsupervised

I data are fit into a model

Models

I Clustering

I Latent DirichletAllocation

I Latent Semantic Indexing

Applications

I document clustering

I topic modelling

Machine Learning Cycle (Sequence Labelling)

Evaluation

I TP, FP, TN, FN are defined in relation to a specific taskI applicable to those where is known (quantifiable) what we are

looking for

I Information RetrievalI retrieving of information relevant to a given search queryI TP True Positives

I docs we did expect to show up and showed up (relevant,present)

I FP False PositivesI docs we didn’t expect to show up but showed up (not

relevant, present)

I TN True NegativesI not relevant docs we didn’t expect to show up and did not

show up (not relevant, missing)

I FN False NegativesI relevant docs we didn’t expect to show up but showed up

(not relevant, present)

Evaluation Metrics

I precisionI precision = tp

I recallI recall = tp

I accuracyI accuracy = tp+tn

tp+tn+fp+fn

I f-scoreI fscore = 2 ∗ precision∗recall

precision+recall

Topic ModellingM. Jockers, The LDA Buffet is Now Open; or, Latent DirichletAllocation for English Majors

Key concepts

I the algorithm extracts topics and representative wordsI the human interpreter eventually assigns a name/label to each

I the number of topics is decided a priori

I each doc has different % of all the topics

I diachronic/synchronic exploration of topics

TM frameworks

I Mallet (Java)

I Gensim (Python)

I Stanford Topic Modelling Toolbox

Topic Modelling (cont’d)

https://dhs.stanford.edu/algorithmic-literacy/

my-definition-of-topic-modeling/

Martha Ballard’s Diary

http://historying.org/2010/04/01/

topic-modeling-martha-ballards-diary/

Thematic Index of Classics in JSTOR

http://catalog.perseus.tufts.edu/jstor/

Mining the Dispatch

http://dsl.richmond.edu/dispatch/

Comprehending the Digital Humanities

https://dhs.stanford.edu/

comprehending-the-digital-humanities/

NLP and Text Mining: an Introduction · 2012. 6. 25. · NLP and Text Mining: an Introduction...

Documents

Introduction to NLP · 2020. 12. 23. · Introduction to NLP What is Natural Language processing (NLP), Motivation, Stages of NLP, - Morphological Analysis, - Syntactic Analysis,

An introduction-to-nlp

dsa nlp introduction - curiousml.github.io

An Introduction to NLP - developing-potential.co.uk · An Introduction to NLP In its simplest definition, NLP is the user manual for our brain. ... Exercise – Thinking Rapport –

An Introduction to Training in NLP

Introduction to Deep Processing Techniques for NLP

NLP. Introduction to Natural Language Processing

Introduction to Facebook Messenger, Conversational UI & NLP

Introduction to NLP. 2 What is NLP From: the NLP group of Sheffield University – Natural Language Processing (NLP) is both a modern

Introduction to Statistical NLP

NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Natural Language Processing: Introduction to Syntactic Parsingdisi.unitn.it/moschitti/Teaching-slides/NLP-IR/NLP-Parsing.pdf · Natural Language Processing: Introduction to Syntactic

Introduction to NLP - Ils · 1. What is NLP? 2. Some application areas of NLP 3. A brief history of NLP 4. Famous NLP systems 5. Ambiguity and NLP 6. Overcoming ambiguity – Brief

Welcome: An Introduction to NLP - Inspire 360 · Welcome: An Introduction to NLP ... Neuro-Linguistic programming (NLP) is an attitude of the mind. A set of tools and techniques for

Introduction To NLP Logix, LLC

ITCS 4111/5111: Introduction to NLP

NLP Introduction based on Project Next NLP (日本語: 20150522)

Introduction to NLP Data-Driven Dependency Parsing

Introduction to NLP - achieving-lives.co.uk Lives Brochure 5... · This introduction to NLP acts as the 1st module of an NLP Coaching Certificate Certificate of the highest quality,

6370739 NLP Introduction