ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Preview:

DESCRIPTION

Talk given at Digital Humanities 2011 (DH2011) in Stanford, USA on 21 June 2011. Web site: http://www.scottishcorpus.ac.uk/corpus/bnc/compair.php Abstract: https://dh2011.stanford.edu/wp-content/uploads/2011/05/DH2011_BookOfAbs.pdf This paper will demonstrate ComPair, a new tool to investigate and compare word usage, encouraging new ways to explore language variation. While remaining focussed on the usability and the promotion of navigation, this tool represents an evolutionary step forward from the author’s previous award winning visualisation applications. This paper will introduce the methods and technologies at its core, perform a demonstration of the tool and discuss opportunities for further collaboration.

Citation preview

ComPair: Compare and Visualise the Usage of Language

David Beavan University of Glasgow David.Beavan@glasgow.ac.uk @DavidBeavan

‘You shall know a word by the company it keeps’

Firth, John R., 1957. Modes of meaning. Oxford: Oxford University Press.

Collocation

•  Words which go together •  More than by chance, they show an association

•  Take a corpus •  Search for a term (node word) •  Examine words in a window (e.g. 5) either side of node •  Aggregate these co-occurring words •  Rank (e.g. by frequency or collocational strength)

‘Stanford’ collocate search via Davies, Mark. (2004-) BYU-BNC: The British National Corpus.Available online at http://corpus.byu.edu/bnc.

Collocates

Collocate Cloud

‘Stanford’ search via Beavan, David. (2008-) BNC Collocate Cloud. Available online at http://www.scottishcorpus.ac.uk/corpus/bnc/collocatecloud.php

Collocate Cloud properties

•  100 most frequent collocates listed alphabetically •  Font size shows frequency of word •  Brightness shows collocational strength of word •  Interactively create new clouds

•  Best New Idea for Improving a Current Web-Based Tool,

2008 TADA Research Evaluation eXchange (T-REX)

Comparison

•  Investigate and compare word usage –  Expose attitudes and cultures –  Investigate degrees of synonymy

•  Semantic prosody –  How synonymous words can actually take on positive or negative

connotations

•  Applications for language learning –  Examine real-world usage of words

ComPair properties

•  Visualise usage of two node words •  Distribute 150+ collocates on a continuum •  Colour shows attraction to node •  Brightness shows degree of collocational attraction

•  Currently uses British National Corpus •  Can be applied to any corpus or dataset (in progress)

ComPair how-to

•  Take two collocate word lists –  Same corpus, different node words –  Different corpora, same node word

•  Calculate collocational strength towards each node –  Mutual Information etc.

•  Place collocates on continuum between node words –  Those with attraction to a single node appear near that node –  Those with little attraction to either node appear central and dim –  Those with attraction to both nodes appear central and bright

ComPair: http://www.scottishcorpus.ac.uk/corpus/bnc/compair.php

David Beavan University of Glasgow David.Beavan@glasgow.ac.uk @DavidBeavan