Defining User Access to the Romanian Online Dialect Atlas Sheila M. Embleton, Dorin Uritescu &...

Preview:

Citation preview

Defining User Access to the Romanian Online Dialect Atlas

Sheila M. Embleton, Dorin Uritescu & Eric S. Wheeler

York University, Toronto, Canada

Context

Romania

Source: http://en.wikipedia.org/wiki/Romanian_language#Geographic_distribution

Romanian

22+ million speakers critical exemplar of eastern

Romance language family

Noul Atlas lingvistic român. Crisana Crisana region in

north-west Romania

Hard copy atlas by Stan and Uritescu (1996, 2003, etc)

Digitize to make it more accessible

RODA: Romanian Online Dialect Atlas

Digitize and present hard copy atlas: Mostly graduate students

in Canada and Romania Enter data from maps into text files When complete, it will be posted to

the Internet for general use

Objective Use Information Technology to

permit a broad range of scholars to access the data, select the data appropriately, and present the data clearly;

and so gain greater understanding of its significance.

Example from RODA

Crisana, Romania

Seeing Words Change

Word final –u from Latin

Word final /u/ from Latin

Latin Romanian(standard and most

dialects)

Dialectal Variation

canto ‘I sing’ cânt cântu(vowel present)

cântu

(nonsyllabic)

oculum ‘eye’ ochi ochiu ochiu

Is word final /u/ random? Look for a geographic pattern over

all potential occurrences The maps for single examples such

as /ochi/ and others, are in dialect Atlas,

But total data for all examples is spread widely over many maps.

/u/ Pattern There is a pattern:

Word final /u/ is retained in central, and north-eastern areas

It is syllabic only in parts of the central area Latin noun vs Latin verb: no difference Non-Latin: less data but consistent with Latin

pattern.Note:

Horizontal values include all word final /u/ Vertical values are non-syllabic word final /u/

RODA as linguistic technology

The technology allows one to:

Ask a user-defined question Compare one query to another See the correlation (vertical vs

horizontal) See the strength of the data (short

vs long bars) Save the results for further

processing or presentation

Requirements Multiple comparisons, using:

Shapes Colours Symbols

Reference to original data: See numeric counts Locate raw data (especially when there

are few examples)

RODA: function Custom-defined maps

• You select the data• You see the result as a map

Programmable access to the whole set of digitized data• You ask about data spread over many maps• You can customize what you search for

(not just the editor’s choice)

RODA: selection of data Context of search becomes important

• Word-final vs non-final vs either• Plain character vs accented character• Character vs (superposed) alternate

Choice of fields to search• E.g. With nouns: sg. vs pl. entries• Variations heard by field workers• Flags to mark special situations (e.g.

hesitation)

Bigger challenge

Access to Data In the humanities,

Large amounts of data Diverse ways of selecting it

Information Technology Has the technology May not understand the needs

Need to learn how to apply IT to our discipline effectively

Development Process Requirements gathering

Prototypes Cycles of propose-and-revise

User testing Test versions on web User feedback is important

Explore technology Changes fast Much to learn

Summary Data will soon be available

You are invited to apply your techniques to the data

Digital data and IT methods permit: Widely accessible data Flexible searching and custom

presentation Repeatable processing

Contacts Sheila Embletonembleton@yorku.ca Dorin Uritescudorinu@yorku.ca Eric Wheelerwheeler@ericwheeler.ca

Test sites: ericwheeler.ca/test aml.yorku.ca/~ewheeler/test