74
MDF and its Applications Sebastian Drude & Irina Nevskaya Goethe-Universität Frankfurt RELISH / Lexicon Meeting Nijmegen July 2010

MDF and its Applications

  • Upload
    cormac

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

MDF and its Applications. Sebastian Drude & Irina Nevskaya Goethe-Universität Frankfurt RELISH / Lexicon Meeting Nijmegen July 2010. MDF and ist Applications. MDF: what is it? Organization of the MDF-format Advantages, problems with MDF Applications and conversions - PowerPoint PPT Presentation

Citation preview

Page 1: MDF  and  its Applications

MDF and its Applications

Sebastian Drude & Irina NevskayaGoethe-Universität Frankfurt

RELISH / Lexicon Meeting Nijmegen July 2010

Page 2: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 3: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 4: MDF  and  its Applications

1. MDF: what is it?

• Originally, the Multiple Dictionary Formatterwas an independent computer program

• It converted certain files in Standard Format into RTF (to be further processed and printed with office software)

• Today it is part of the Toolbox (formerly Shoebox) program, in form of Consistent Changes tables (*.cct, complex scripts for search-and-replace routines) and MS-Word template files (*.dot)

Page 5: MDF  and  its Applications
Page 6: MDF  and  its Applications

1. MDF: what is it?

Standard Format (SF) is a very old text format developed by SIL with minimal mark-up:

• The content is organized in “fields”• Each field consists of a “marker” (a newline

followed by a backslash and a sequence of letters, hyphens, digits etc.) and the “field content” (free text), separated from the marker by a space character

• This is a simple feature–value structure

Page 7: MDF  and  its Applications

Entry

Field

Field marker

Field content

“Standard Format” data file

Page 8: MDF  and  its Applications

1. MDF: what is it?

• The MDF program uses a certain SET of markers, representing typical data categories used in traditional lexicography

• Properties of the fields (Language etc.) and a minimal hierarchical structure through a “is–below”–relation are kept in a separate “.typ” (type) file, which is also in SF

• In this sense, a file in MDF format is a (SF) text file which uses the MDF set of markers (in the MDF hierarchical organization)

Page 9: MDF  and  its Applications

Marker def.

Description

Language

Position in hierarchy

MDF.typ (config file)

Page 10: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 11: MDF  and  its Applications

2. Organization of the MDF-format

• There are currently about 100 markers directly supported by MDF (“MDF-fields”)

• The basic hierarchy is:\lx (lexeme)└˃ \se (sub-entry) └˃ \ps (part of speech) └˃ \sn (sense number)

• Other hierarchies might or used to be supported: ( \lx > \se > \sn > \ps or \lx > \sn > \ps > \se )

Page 12: MDF  and  its Applications

1. MDF: what is it?

MDF is documented by the book:Coward, David F. & Grimes, Charles E. (2000). Making Dictionaries: A guide to lexicography and the Multi-Dictionary Formatter. Waxhaw, North Carolina: SIL International (1st ed. 1995)

URL: http://www.sil.org/computing/shoebox/MDF_2000.pdfhttp://www.sil.org/computing/shoebox/MDF_Updates.html

Page 13: MDF  and  its Applications
Page 14: MDF  and  its Applications

2. Organization of the MDF-formatSeveral fields can be repeated for up to four different languages, where “..” → v = vernacular, e = English, n = national, r = regional• \ps, \pn – part of speech for main entry word (English, national)• \g.. – gloss for main entry word• \d.. – definition for main entry word• \re, \rn, \rr – reverse (for indexes)• \we, \wn, \wr – word-level gloss• \x.. – example (sentence and translations)• \e.. – encyclopedic information• \u.. – usage information• \o.. – only (restriction) information• (\va), \ve, \vn, \vr – variant form comment• (\cf), \ce, \cn, \cr – cross reference gloss• (\lf), \le, \ln, \lr – “lexical function” (gloss for related word)• \pd.. – “paradigm” (gloss for –irregular– form)

Page 15: MDF  and  its Applications

2. Organization of the MDF-format

Some 20 fields are discouraged: • \an (antonym), \sy (synonym) are to be substituted

by the \lf (lexical function), \lfv (lexical function vernacular), \lf.. (lexical function gloss) fields• \sg (singular), \pl (plural), \1s (first person singular) etc.

are to be substituted by the \pdl (paradigm form label), \pdv (paradigm form vernacular), \pd.. (paradigm form gloss) fields (not yet in the documentation)

Two fields (\dt, \st) are administrative fields

So there are only about 50 genuinely different MDF fields

Page 16: MDF  and  its Applications

2. Organization of the MDF-format• Some of the fields form blocks/groups via the hierarchy,

for instance:• \lf (lexical function, relations to other entries)

└˃ \lfv related form, \lf.. gloss of rel. form (Engl., nat., reg.)• \pd (Paradigm information & irregular forms)

└˃ \pdl pdg. label, \pdv pdg. form, \pd.. pdg. gl. (Engl., nat., reg.)• \rf (reference to an example)

└˃ \xv example form in the vernacular └˃ \x.. translation of rel. form (Engl., nat., reg.)

• \cf (cross-reference form) └˃ \c.. cross-reference gloss (Engl., nat., reg.)

• \va (variant form) └˃ \v.. comment on variant form (Engl., nat., reg.)

Page 17: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 18: MDF  and  its Applications

3. Advantages, problems with MDF

Advantages:• Very flexible SF database format

(optional fields, repeated fields etc.)• Quite exhaustive for standard lexicography

in field research on minority languages• Is a de-facto standard, although Toolbox

is officially not supported by SIL any more (now replaced by FIELD / FLEX)

Page 19: MDF  and  its Applications

3. Advantages, problems with MDF

General problems:• Flexibility of SF allows for inconsistencies• Only recommended order for sister fields• Almost always extended and adjusted

arbitrarily by individual users (MDF-derived / MDF-based formats)

• Changes in the hierarchy in the configuration are not reflected in the data file and vice versa

• Missing closing tags in SF impair conversions

Page 20: MDF  and  its Applications

3. Advantages, problems with MDF

Specific problems in the RELISH project:• \ph (phonetic form) is too generic, it would

be needed in several different contexts (\cf, \va, \pdv, \lfv…)

• \lt (literal meaning) exists only for head word, it would be needed for borrowed words etc.

• Even the 3 languages are not sufficient • To set a “language” property should be possible

for arbitrary fields

Page 21: MDF  and  its Applications

3. Advantages, problems with MDF

Specific problems in the RELISH project:• No clear solution for covering several dialects• In particular if no dialect is “standard”• Different solutions:

– \ue (usage information)– \oe (only / restriction)– \ns (notes on sociolinguistics, varieties)– \lf SynD = … (lexical function “Dialectal Synonym”)– \va & \ve (variant form and English comment)

• Most of these solutions only hold for the head word, we would need dialect marking for \lx, \xv, \va, …

Page 22: MDF  and  its Applications

3. Advantages, problems with MDF

Comment on dialect problem in MDF book:“We intend future enhancements of MDF to have fields dedicated to dialectal information, but at present the programming limitations do not allow us any more field bundles. For the present, use \va and \lf SynD =. (footnote p23)

Page 23: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 24: MDF  and  its Applications

4. Applications and conversions

“Applications” (of the format) may have different meanings:

• For different languages / dictionary projects• For transformations / conversions:– print-dictionaries (via Toolbox, MDF, Word / RTF)– HTML (Lexique Pro)– XML (Toolbox export)– LMF – XML (Lexus import)– FLEX database

Page 25: MDF  and  its Applications

4. Applications and conversions

Problems with all conversions:• What happens with inconsistencies?• What happens with different orders

of same-level-fields?• What happens with additional (non-MDF) fields?• What happens with sub-entries?

Page 26: MDF  and  its Applications

4. Applications and conversions

Page 27: MDF  and  its Applications
Page 28: MDF  and  its Applications

4. Applications and conversions

Page 29: MDF  and  its Applications

4. Applications and conversions

Page 30: MDF  and  its Applications

MDF and ist Applications

1. MDF: what is it?

2. Organization of the MDF-format

3. Advantages, problems with MDF

4. Applications and conversions

5. MDF in the RELISH project: Udi

Page 31: MDF  and  its Applications

5. MDF in the RELISH project: Udi

Page 32: MDF  and  its Applications

5. MDF in the RELISH project: Udi

Page 33: MDF  and  its Applications

5. MDF in the RELISH project: Udi

• Digital representation of a print dictionary, with additions

• Main problem: several languages:– Udi (v)– Azerbaidjan (Cyrillic) (n1)– Azerbaidjan (Latin) (n1lat) (addition)– Georgian (n2)– Russian (r)– English (e) (addition)

Page 34: MDF  and  its Applications

5. MDF in the RELISH project: Udi

• The Udi Toolbox database uses 53 fields• of these, 14 are standard MDF fields• 11 are MDF fields which have a slightly

different position in the hierarchy• 28 fields are additional fields– most (19) of these are for adjusting

the additional “languages” (and scripts)– 5 are for additional phonetic representations

Page 35: MDF  and  its Applications

\lx . . . . \gn1 . \hm . . . . . \dn1. \se . . . . . \ltn1. \mn . . . . . \nan1. . \mn-ph . . . . . \gn1lat. . \ph . . . . . . \dn1lat. . \a . . . . . . \ltn1lat. . . \a-ph . . . . . . \nan1lat. . \bw . . . . \gr . . \ns . . . . . \dr . . \ng . . . . . \ltr. . \va . . . . . \nar. . . \va-ph . . . . \gn2 . . . \va-ns . . . . . \dn2. . \pl . . . . . \ltn2. . . \pl-ph . . . . . \nan2. . \ee . . . . \ge . . . \er . . . . . \de. . \lt . . . . . \oe. . . \lte . . . . \xv . . \ps . . . . . \xv-ph. . . \pr . . . . . \x-ns. . . \sn . . . . . \xn1. . . . \nt . . . . . . \xn1lat. . . . \gn1 . . . . . \xr

. . . . . \xn2

. . . . . \xe

. \dt

Page 36: MDF  and  its Applications

MDF-LEXUS conversion

1. From a printed dictionary to a markup text file2. From a markup text file to the MDF structure in the Toolbox

environment3. From the MDF structure to the LEXUS structure

Page 37: MDF  and  its Applications

Step 1. From a printed dictionary to a markup text file - 1

Page 38: MDF  and  its Applications

Step 1. From a printed dictionary to a markup text file - 2

Page 39: MDF  and  its Applications

Step 2. From a markup text file to the MDF structure in the Toolbox environment - 1

• Establishing correlations of different sign combinations and their linguistic counterparts

• Establishing the MDF markers‘ structure and their hierarchies • Consistency checks:• Cross-reference failures:• - absence of the head word• - absence of the variant• Numerous spelling mistakes• Numerous mistakes in the Russian and English translations• Inconsistencies in contrasting subentries and examples

Page 40: MDF  and  its Applications

Step 2. From a markup text file to the MDF structure in the Toolbox environment -2

Page 41: MDF  and  its Applications

Step 3. From the MDF structure to the LEXUS structure - 1

Page 42: MDF  and  its Applications

Step 3. From the MDF structure to the LEXUS structure - 2

Page 43: MDF  and  its Applications

Step 3. From the MDF structure to the LEXUS structure - 3

Page 44: MDF  and  its Applications

Step 3. From the MDF structure to the LEXUS structure - 4

Page 45: MDF  and  its Applications

5. MDF in RELISH: Udi into Lexique Pro

Page 46: MDF  and  its Applications
Page 47: MDF  and  its Applications
Page 48: MDF  and  its Applications
Page 49: MDF  and  its Applications

5. MDF in RELISH: Udi into Lexique Pro

Page 50: MDF  and  its Applications

From the MDF to the FLEX structure

• Defining writing systems– Problems with introducing digraphs and the corresponding

sort orders• Defining import properties– Problems with markers‘ matching due to different markers

and their hierarchies– Import failures

• 2 attempts: – project Udi1– Project Udi 2

Page 51: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 1 Defining writing systems

Page 52: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 2Defining the file format

Page 53: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 3Language mapping

Page 54: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 4Content mapping

Page 55: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 5Content mapping

Page 56: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) – 6Key markers

Page 57: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) – 7readiness check

Page 58: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 8

Page 59: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 9

Page 60: MDF  and  its Applications

Attempt 1: project Udi 1 (import residue) - 10

Page 61: MDF  and  its Applications

Attempt 2: project Udi 2 -1encoding writing systems

Page 62: MDF  and  its Applications

Attempt 2: project Udi 2 – 2defining the file format

Page 63: MDF  and  its Applications

Attempt 2: project Udi 2 - 3language mapping

Page 64: MDF  and  its Applications

Attempt 2: project Udi 2 - 4content mapping

Page 65: MDF  and  its Applications

Attempt 2: project Udi 2 - 5content mapping

Page 66: MDF  and  its Applications

Attempt 2: project Udi 2 - 6defining custom fields

Page 67: MDF  and  its Applications

Attempt 2: project Udi 2 - 7modifying mapping

Page 68: MDF  and  its Applications

Attempt 2: project Udi 2 - 8defining key markers

Page 69: MDF  and  its Applications

Attempt 2: project Udi 2 - 9readiness check

Page 70: MDF  and  its Applications

Attempt 2: project Udi 2 - 10import preview results

Page 71: MDF  and  its Applications

Attempt 2: project Udi 2 - 11import preview results

Page 72: MDF  and  its Applications

Attempt 2: project Udi 2 - 12ready to import

Page 73: MDF  and  its Applications

Attempt 2: project Udi 2 - 13import failures

Page 74: MDF  and  its Applications

MDF and its Applications

Sebastian Drude & Irina NevskayaGoethe-Universität Frankfurt

RELISH / Lexicon Meeting Nijmegen July 2010