27
Interactive Visualizations for Linguistic Analysis Verena Lyding and Henrik Dittmann {verena.lyding/henrik.dittmann}@eurac.edu Institute for Specialised Communication and Multilingualism, EURAC, Bozen-Bolzano

Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Interactive Visualizations for Linguistic Analysis

Verena Lyding and Henrik Dittmann

{verena.lyding/henrik.dittmann}@eurac.edu

Institute for Specialised Communication and Multilingualism, EURAC, Bozen-Bolzano

Page 2: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 2

Information visualization “The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.” (Card et al., 1999)

Aim: providing a cognitive aid for

• the illustration of data and their structure/organization

• the analysis and manipulation of data

“Good visualizations use graphics to organize information, highlight important information, allow for visual comparisons, and reveal patterns, trends, and outliers in the data.” (Hearst, 2009)

Page 3: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 3

Some examples Classical ways of visualizing data include: graphs, networks, charts, diagrams, maps, but also text.

Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual. Text is visual.

Page 4: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 4

More examples Recently, some visualizations of language data have been introduced, including word clouds, concept galaxies, sparklines, etc.

taken from: http://www.dwds.de

Page 5: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 5

How do visualizations work? Information is transformed into graphics, using ‘visual variables’:

Taken from Carpendale (2003).

Page 6: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 6

Constructing meaningful visualizations Follow visualization principles.

“Sameness of a visual element implies sameness of what the visual element represents.” (Tufte, 2006)

Follow Gestalt psychology principles of perception, like proximity and similarity.

“Clutter and confusion are failures of design, not attributes of information.” (Tufte, 1999)

Don’t hide information without indicating what is left out.

Present information in context.

“Overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996)

Page 7: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 7

LInfoVis – Linguistic Information Visualization The application of information visualization principles to display any kind of information concerning language and its use.

LInfoVis is a specialization of InfoVis.

And a particular challenge due to:

• structure and complexity of linguistic data

• textual elements

“The categorical nature of text, and its very high dimensionality, make it very challenging to display graphically.” (Hearst, 2009)

Page 8: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 8

LInfoVis at EURAC Motivation: Development and implementation of visualizations for

language data. With focus on the representation and analysis of language resources, in particular corpora.

Project running since the end of 2008, by the language technologies group at EURAC; initiated by Chris Culy (now University of Tübingen).

By now: Development of several visualization prototypes, investigation of application contexts, visualization tools for linguistic projects.

www.eurac.edu/linfovis

We are generally interested in collaborations!

Page 9: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 9

Visualizing language data We can distinguish between visualizations for

a) the presentation of data

b) the analysis of data

which can be targeted to

1) language data in context, e.g. KWIC

2) information derived from language data, e.g. frequency lists

Page 10: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 10

LInfoVis for data presentation Data displays that visually highlight relevant aspects of the data.

Some examples for:

• text

• collocations

• occurrences of words over time

Page 11: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 11

Visualization indicating frequencies of words in their textual context; frequencies are encoded by character size of words.

Page 12: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 12

Graphic from Wikimedia, by Kai Zimmer, www.dwds.de

Graph visualization of the word „Ziel“ and its collocations as calculated based on the DWDS core corpus.

Page 13: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 13

Chart displaying frequencies of the indicated words in newspaper text over a crucial period of time.

Page 14: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 14

LInfoVis for analysis Visualizations that highlight data characteristics and allow for the interaction with the display to explore the data.

Techniques from information visualization:

• search and filter, for focus and context

• overview, zoom-in, details on demand

• multiple views, and brushing and linking

Page 15: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 15

Interactive visualizations Corpus Clouds (Culy/Lyding, 2009) • visualization

of corpus query results

• multiple panels for different types of information

• interactive features for data exploration

Page 16: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 16

Interactive visualizations Double Tree (Culy/Lyding, 2010) Double Tree

shows a concordance in a compressed form that allows for interactive exploration.

Page 17: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 17

Interactive visualizations xLDDs – Extended Linguistic Dependency Diagrams (Culy/Lyding/Dittmann, 2011b)

xLDDs support the analysis of dependency structures by providing • a set of visual features (such as color, size and shape) for the

presentation of relations • user controls for focusing on specific information

Page 18: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 18

Visualizing derived information Linguistic analyses are not only concerned with the linguistic data itself, but also with information about this data.

e.g. quantitative analysis

Parallel Coordinates (Inselberg, 2009) is a common visualization for high-dimensional data.

For linguistic analyses, we have developed Structured Parallel Coordinates (SPC), (Culy/Lyding/Dittmann, 2011a):

• inherent ordering of the axes

• advanced methods for filtering, selection and highlighting

Page 19: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 19

Parallel Coordinates Visualization originally invented by d'Ocagne (1885).

In modern Information Visualization developed and popularised by Inselberg in 1959 (cf. Inselberg, 2009).

Taken from Heer et al. (2010).

Page 20: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 20

Interactive visualizations SPC - Structured Parallel Coordinates (Culy/Lyding/Dittmann, 2011a)

n-grams and frequencies application

Page 21: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 21

Interactive visualizations SPC – corpus comparisons

evolution of verb constructions over time and register

Page 22: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 22

Interactive visualizations

SPC – ranking comparisons

Live demo:

• using ranking comparisons for sub-corpus analysis

• automatic re-ordering of the axes by similarity

Page 23: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 23

Summing it up Information Visualization has a lot to offer for the description

and exploration of complex data.

Our LInfoVis work aims at combining insights and methods from InfoVis and linguistic research settings.

Visualizations need to be put to the test in different areas of linguistic analysis. The users’ needs and experiences are an important base for:

– improving existing software and

– pursuing new directions in LInfoVis

Page 24: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 24

Thank you!

{verena.lyding/henrik.dittmann}@eurac.edu

www.eurac.edu/linfovis

Page 25: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 25

Bibliography Bertin, J. (1982): Graphische Darstellungen. Graphische Verarbeitung von Informationen.

Berlin/New York: de Gruyter.

Card, S. K. / Mackinlay, J. D. / Shneiderman, B. (1999): Information Visualization: Using Vision to Think. San Francisco: Morgan Kaufmann Publishers.

Carpendale, M. (2003): Considering visual variables as a basis for information visualisation. Dept. of Computer Science, University of Calgary, Canada, Tech. Rep. 2001-693-16.

Collins, C. / Penn, G. / Carpendale, S. (2008): Interactive visualization for computational linguistics. ACL-08: HLT Tutorials. http://www.cs.utoronto.ca/~ccollins/acl2008-vis.pdf. Access date: February 9, 2012.

Culy, C. / Lyding, V. (2009): Corpus Clouds - facilitating text analysis by means of visualizations. In: Proc. of the 4th Language & Technology Conference (LTC ’09), Poznan, Poland, 521-525.

Culy, C. / Lyding, V. (2010): Double Tree: An Advanced KWIC Visualization for Expert Users. In: Proc. of the 14th International Conference on Information Visualization (IV 2010), London, United Kingdom, 98-103.

Culy, C. / Lyding, V. / Dittmann, H. (2011a): Structured Parallel Coordinates: a visualization for analyzing structured language data, In: Proc. of the 3rd International Conference on Corpus Linguistics (CILC 2011), Valencia, Spain.

Page 26: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 26

Culy, C. / Lyding, V. / Dittmann, H. (2011): Visualizing Dependency Structures. In: Proc. of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, Hamburg, Germany, 81-86.

Hearst, M. (2009): Search User Interfaces. Cambridge: Cambridge University Press.

Heer, J. / Bostock, M. / Ogievetsky, V. (2010): A Tour through the Visualization Zoo. ACM Queue 8(5).

Inselberg, A. (2009): Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. New York: Springer.

d’Ocagne, M. (1885): Coordonnées Parallèles et Axiales: Méthode de transformation géométrique et procédé nouveau de calcul graphique déduits de la considération des coordonnées parallèlles. Paris: Gauthier-Villars.

Richter, M. (2005): Analysis and visualization for daily newspaper corpora. In: Proc. of Recent Advances in Natural Language Processing (RANLP – 2005), Borovets, Bulgaria, 424–428.

Shneiderman, B. (1996): The eyes have it: A task by data type taxonomy for information visualizations, In: Proc. of the IEEE Symposium on Visual Languages, 336-343.

Tufte, E. (1999): Envisioning Information. Cheshire, Connecticut: Graphics Press LLC.

Tufte, E. (2006): Beautiful Evidence. Cheshire, Connecticut: Graphics Press LLC.

Page 27: Interactive Visualizations for Linguistic Analysis · “The use of computer -supported, interactive, visual representations of abstract data. to amplify cognition.” (Card et al.,

Institute for Specialised Communication and Multilingualism

06.02.2012 27

Further links and references

DWDS Wortprofil 2010 for the word „Sprache“, http://www.dwds.de, Access date: February 9, 2012.

IN-SPIRETM visual analysis tools, http://infoviz.pnnl.gov/tech_inspire.stm, Access date: February 9, 2012.

jQuery Sparklines by Gareth Watts for Splunk Inc., http://omnipotent.net/jquery.sparkline/, Access date: February 9, 2012.

Graph visualization of collocations for the word „Ziel“ by Kai Zimmer, DWDS, http://upload.wikimedia.org/wikipedia/commons/c/c2/DWDS_Kollok_Ziel_big.png, Access date: February 9, 2012.

Tutorial on the “Visualization of Linguistic Information” by Culy, C. / Lyding, V., presented at the 37. Österreichische Linguistiktagung in Salzburg, Austria, December 7, 2009.

“Visualization as Part of the Linguistic Processing Pipeline” by Culy, C. / Lyding, V., presented at the Linguistic Processing Pipelines workshop at the GSCL conference, 29 September 2009 in Potsdam, Germany.

Word Clouds by Chris Culy, EURAC, http://search.korpus-suedtirol.it:8089/wordcloud/word_cloud_eurac.html, Access date: February 9, 2012.