26
Multilingual Interfaces for Biodiversity Information Guy Baillargeon Agriculture and Agri-Food Canada, Ottawa Key Innovations in Biodiversity Informatics Indaiatuba, SP, Brazil, 21-22 Oct. 2002

Multilingual Interfaces for Biodiversity Information Guy Baillargeon Agriculture and Agri-Food Canada, Ottawa Key Innovations in Biodiversity Informatics

Embed Size (px)

Citation preview

Multilingual Interfaces for Biodiversity Information

Guy BaillargeonAgriculture and Agri-Food Canada, Ottawa

Key Innovations in Biodiversity InformaticsIndaiatuba, SP, Brazil, 21-22 Oct. 2002

AbstractThe Global Biodiversity Information Facility (GBIF) intends to promote standards and software tools designed to facilitate their adaptation into multiple languages. Countries, economies and organizations participating in GBIF are invited to develop novel user interface designs that incorporate features to support their functionality in a multi‑lingual global context and to develop standards and protocols for indexing, validation, documentation and quality control in multiple human languages, character sets and computer encodings. Ensuring that GBIF applications perform well in any language and that biodiversity data can be put to good use independently of the language of the primary records is a formidable but achievable challenge. It is very much a requirement if we want GBIF to fulfill its potential. How some of this can be done will be demonstrated by outlining the steps required to add a new language to the Integrated Taxonomic Information System (ITIS) and to the Biological Observations, Specimens and Collections (BiOSC) Gateway. A new Portuguese version developed in cooperation with Brazil will be shown for the first time.

Think globally

• 92% of the world population speaks little or no English

• 20 main Asian languages• 15 main European languages

Source: http://www.alis.com/pdf/GlobalisationEN.pdf

An enormous diversity

Continent Spoken languages

The Americas: 1,013

Africa: 2,058

Europe: 230

Asia: 2,197

The Pacific: 1,311

Total: 6,809

Source: http://www.ethnologue.com/ethno_docs/distribution.asp

Human languages hit paradeSpeakers(Millions)

1 Mandarin Chinese 8852 Hindi Devanagari 3753 Spanish Latin 3584 English Latin 3475 Arabic Arabic 2116 Bengali Bengali 2107 Portuguese Latin 1788 Russian Cyrillic 1659 Japanese Chinese / J apanese 12510 German Latin 10011 French Latin 7712 Malay / Indonesian Latin 58

ScriptRank Language

Source: http://www.krysstal.com/spoken.html combined with http://www.multilingualplanet.com/most_spoken_languages.htm

World on line population

Total: 619 millions

Source: http://www.glreach.com/globstats/

As of Sept. 2002

Non-English growing faster

Source: http://global-reach.biz/globstats/evol.htm

Multiple languages

• English Internet users– 2000: 58%– 2005: < 35%

• Non English online traffic– 2000: 40%– 2005: 70%

Source: http://www.alis.com/pdf/GlobalisationEN.pdf

GBIF MOU Goals

• 2. “It is the intention of the Participants that GBIF: – […]– (d) promote standards and software

tools designed to facilitate their adaptation into multiple languages, character sets and computer encodings”.

– […]

Source: http://www.gbif.net/moufinal.doc

GBIF MOU Scope of activities

• 4 (a) (iii) “Developing suitable tools and standards for accessing, linking and analysing new and existing databases, including standards and protocols for indexing, validation, documentation and quality control in multiple human languages, character sets and computer encodings;”

Source: http://www.gbif.net/moufinal.doc

Canadian context

Client Browser

ITIS

Translation module Other multilingual applic.

• Requirement for a bilingual version of ITIS in Canada

• Not changing the underlying data model

• Wanted reusable components• Capability to handle other

languages as well

Canadian context

Introducing ITIS*Brazil

• A new version of ITIS in Portuguese• SIIT*Brasil - Versão em português

• Developed in cooperation with CRIA• August - Sept. 2002

• www.itis.cria.org.br

ITIS/BiOSC Data flow diagram

BiOSC Gateway

ENHSIN

DB

TSA

REMIB

AVH

DIGIR

DB

DB

DB

DB

DB

DB

DB

Client Browser

ITIS

WMS map serverDEMIS map

WMS layer

WMS layer

11 : Query ITIS

2 : Click Map it! button

3 : Get record index data from BiOSC

4 : Get full record from data owner

2

Translation module Other multilingual applic.

3

4

How is it done

• Selective translation• Semantic partitioning• Automated rendering• Localisation

– Cultural conventions (date format, decimal separators, number format)

• Alternate spellings

Architecture

• Single multilingual application serverEach stored procedures– Handles all languages– Locale sensitive– Single character set on a single

encoding– Linguistic sorting

General issues

• Treat look and feel independently from language issues

• Determine user preference• Handle non-ASCII form input and

query strings• Enable procedure for content

translation• Tag HTML output with encoding

information

Look and feel

• Stored as blocs of static HTML components– Header– Footer– Background– Images, buttons, logos

User preference

• Language and locale defined via passable parameter

• User selectable

Query string handling

• URLs can only be encoded in 7-bit ASCII

• 8-bit bytes are transformed into their hexadecimal representation prefixed by a percent sign

• Requires decoding by the application– German word “Schloß” converted to

• Schlo%c3%9f (Unicode)• Schlo%DF (Latin)

Base letter conversion

• Convert accented character to unaccented for easier query- éèêë to e- òóóõö to o- àáâãäå to a- ùúûü to u- ç to c- ñ to n

- Output in correct (accented) form

Alternate spellings

- German, Danish and Swedish- ä to ae- ö to oe- ü to ue- å to aa- ø to oe- æ to ae

- Output in standard format

Translation table

• All translation strings are externalized to a database table– String_id, Language_id, Translation

• Primary key on String_id and Language_id

• Translations are retrieved via SQL

Code snippethtp.prn(ctislib.multitext(177,p_lang)||': '||

ctislib.multidata(p_lang,90, v_info_cursor.currency_rating));

en: Current Standing: acceptedfr: Statut: acceptées: Estado actual: aceptado

pt: Posição atual: aceito

Multitext function accepts (string id number, language parameter) to translate application text

Multidata function accepts (language parameter,table id number, text to be translated) to translate data

Conclusion

• Translation module works well for Western European languages

• Could probably easily handle other languages using Latin script

• Could probably expand to other alphabets such as Greek and Cyrillic

• The big challenge: pictorial scripts– Japanese, Chinese, Korean …

Credits

• Canada– Guy Baillargeon– Derek Munro

• Brazil– Vanderlei Perez Canhos– Dora Ann Lange Canhos– Sidnei de Souza