29
Multilingual, Multi- script Catalog Requirements (An Arcadia Project) ________________________ January 29, 2010

Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

  • Upload
    juan

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________. January 29, 2010. Outline _____________________________________________________. Background about the Arcadia non-Roman script project Introductions Orbis vs. YUFind and systems like YUFind - PowerPoint PPT Presentation

Citation preview

Page 1: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Multilingual, Multi-script Catalog Requirements

(An Arcadia Project)________________________

January 29, 2010

Page 2: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Outline_____________________________________________________

• Background about the Arcadia non-Roman script project

• Introductions

• Orbis vs. YUFind and systems like YUFind

• Requirements discussion

• Wrap-up

Page 3: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Project Goals _____________________________________________________

• Gap analysis of multilingual, multi-script functionality in Lucene-Solr-Solrmarc discovery applications (e.g., YUFind)

• Identification of desirable functionality

• Collaboration opportunities, community interest

• Recommendations with level-of-effort analysis

Page 4: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Orbis vs. Yufind_____________________________________________________

Page 5: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

vs

Chinese example:

“中日韩经济合作的新起点”

N-gram tokens, where N=2: <中日 > <日韩 > <韩经 > <经济 > <济合 >

<合作 > <作的 > <的新 > <新起 > <起点>

Page 6: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Background: NR Scripts in Catalog Records_____________________________________________________

Page 7: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

JACKPHY_____________________________________________________

Page 8: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (CJK)_____________________________________________________

Example: “Mao Zedong”

毛泽东 Simplified

毛澤東 Traditional

毛沢東 Kanji (Modern)

Page 9: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (CJK) _____________________________________________________

“Mao Zedong” in simplified Chinese characters retrieves 527 results

Page 10: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (CJK) _____________________________________________________

The same search in traditional Chinese characters yields154 hits.

Also Note paired fields

Page 11: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (Digraphs)_____________________________________________________

ירטשאפטו ו

The Yiddish word “Virtshaft” is entered here with two separate vavs (i.e., key stroke ‘u’ in Microsoft’s Hebrew IME): U05D5 + U05D5

Page 12: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (Digraphs) _____________________________________________________

N = 49 results

Page 13: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (Digraphs)_____________________________________________________

ירטשאפטװ

The same word is this time entered as a double-vav digraph = U05F0 (via MS Hebrew IME key combo right-alt+u)

Page 14: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

One-to-Many (Digraphs)_____________________________________________________

N = 11 results

Page 15: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

NR Spelling Suggestions_____________________________________________________

Unhelpful suggestion?

Page 16: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Labels and Facets_____________________________________________________

Should script/language of query determine script/language of facets?

Page 17: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Labels and Facets_____________________________________________________

Better would be:

杉本つとむ , 1927- (11)高橋幹夫 , 1935- (11)野口武彦 . (8)渡辺信一郎 , 1934- (7)

OR:

Sugimoto, Tsutomu, 1927- (11)Takahashi, Mikio, 1935- (11)Noguchi, Takehiko. (8)Watanabe, Shin’ichirō, 1934- (7)

But not both mixed together.

Let end user decide?

Page 18: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Labels and Facets_____________________________________________________

We would like to choose our preference of display script here. For example,

<Original scripts>江戸By: 野村兼太郎 , 1896-1960.Published: 1942Format: Book, Electronic Resource

江戶 の 翻訳家たち By: 杉本 つとむ , 1927- Published: 1995Format: Book, Electronic Resource

We would like to ask library users the best option for displaying parallel field data:<Original scripts>江戶 / 田中優子編 . Contributors: 田中優子 , 1952-Format: Book Language: Japanese Published: 東京 : 作品社 , 1998.Series: 日本の名随筆 . 03 别卷 ; 94

<Paired w/OS first>江戶 / 田中優子編 . Edo / Tanaka Yūko hen. Contributors: 田中優子 , 1952-

Tanaka, Yūko, 1952- Format: Book Language: Japanese Published: 東京 : 作品社 , 1998.

Tōkyō : Sakuhinsha, 1998.

Series: 日本の名随筆 . 03 别卷 ; 94 Nihon no meizuihitsu. 03 Bekkan ; 94

Page 19: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Language/Script of Interface _____________________________________________________

OCLC’s brief record display

Interface easily flipped to one of several languages

Page 20: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Language/Script of Interface_____________________________________________________

OCLC’s detailed record display with Japanese language interface

Page 21: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Language/Script of Interface

OCLC WorldCat.org does localization of labels and instructions as well as localization of mapped facet values. Examples here in Chinese.

Page 22: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Language/Script of Interface_____________________________________________________

Page 23: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Language/Script of Interface & Text Directionality_____________________________________________________

Page 24: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Sorting of Results_____________________________________________________

江戸文学俗信辞典 Edo bungaku zokushin jiten

江戸文学地名辞典 Edo bungaku chimei jiten

江戸文学辞典 Edo bungaku jiten

江戸文様辞典 Edo mon’yo jiten

Page 25: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Sorting of Results_____________________________________________________

Also note bi-directional text

Page 26: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Sorting within result sets: Options to Consider

_____________________________________________________

For multiple languages sharing a script, e.g. Chinese ideographs, Arabic, Hebrew, or Latin, how would the users prefer to see the result sets sorted?

We consider here the Chinese & Arabic cases…

Page 27: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Sorting within Result Sets: Options to Consider

_____________________________________________________

Sorting of results returned in Chinese script—

Three sort strategies: (a) sort by Romanized equivalents; (b) sort by pronunciation; or (c) sort by radical-stroke?

Page 28: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Sorting within Results Sets:Arabic script

_____________________________________________________

How to handle additional Arabic-script characters in use for languages such as Persian, Kurdish, and/or Urdu?

(fah ,ف vah, derived from) ڤ پ (pah)(gim , ج chah, derived from) چ (gaf) گ (zayin ,ز zāī, derived from) ژ

Page 29: Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________

Jan 2010

Discussion

User Needs and Expectations