Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding? Morse Code (1840)...

Preview:

Citation preview

Localizing OpenClinicaHiroaki Honshuku: SQA

1

©

What is Character Encoding? Morse Code (1840) → Latin Alphabet

ASCII (1963) The American Standard Code for Information Interchange Characters, Numerals, Symbols, Control Characters 7-bit: 0~127

0x41 = letter ‘A’, 0x61 = letter ‘a’

ISO-8859-n 8-bit: 0-255 iso-8859-1: Latin-1, covers most of European Language iso-8859-5: Cyrillic alphabet No CJK (Chinese, Japanese, Korean) support

2

©

What is Character Encoding (cont.) iso-8859-1 versus iso-8859-5

3

iso-8859-1 iso-8859-5

A 0x65 A 0x176

B 0x66 B 0x178

©

What is Character Encoding (cont.) iso-8859-1 versus iso-8859-5

CJK Encoding Mess Chinese: Big5 (Traditional), GB18030 (Simplified) Japanese: iso-2022-JP, EUC-JP, Shift-JIS Korean: EUC-KR, KS C 5861

4

iso-8859-1 iso-8859-5

A 0x65 A 0x176

B 0x66 B 0x178

©

What is Character Encoding (cont.) iso-8859-1 versus iso-8859-5

CJK Encoding Mess Chinese: Big5 (Traditional), GB18030 (Simplified) Japanese: iso-2022-JP, EUC-JP, Shift-JIS Korean: EUC-KR, KS C 5861

Windows propriety Encoding CP1252, CP932, etc

5

iso-8859-1 iso-8859-5

A 0x65 A 0x176

B 0x66 B 0x178

©

Unicode 1887: Apple + Xerox

1991: Unicode Consortium

6

©

Unicode 1887: Apple + Xerox

1991: Unicode Consortium

UTF-8: 1,112,064 Code Points Standard ASCII Compatible Unix, Linux, Mac OS Big Endian

7

©

Unicode 1887: Apple + Xerox

1991: Unicode Consortium

UTF-8: 1,112,064 Code Points Standard ASCII Compatible Unix, Linux, Mac OS Big Endian

UTF-16 (UCS-2) : 1,112,064 Code Points Windows Only Little Endian: Requires BOM (Bite Order Marker)

8

©

OpenClinica and i18n i18n Support since 3.1.3

OpenClinica i18n Work in Progress Data Mart

Response OptionText CRF Name

Discrepancy Note data passing Escaping Ctrl Chars and MS Propriety Chars

Should detect at CRF upload

Hard-coded strings Missing encode declaration in some Export formats

9

©

Microsoft Specific issues Display issues on Windows

Pre-Win7, GUI was not fully UTF-8 compatible Displayed character corruption after saving data

Viewing extracted data Use UTF-8 compatible Text Editor

Never Copy/Paste from MSOffice

10

©

Demonstration Search Subjects and Tables

CRF and Data Entry

Discrepancy Notes

Rules

Data Import

Data Extract

11

©

How to Localize Documentation

https://docs.openclinica.com/3.1/technical-documents/openclinica-and-internationalization

UTF-8 Converter i18n strings needs to be Hex value

http://www.branah.com/unicode-converter Calendar Widget can take UTF-8 strings

Pseudo Translation Insert one distinctive non-ASCII character

Duplicate English properties files first Search “ = “ and replace by “ = \u8a66”

12

©

How to Localize (cont.)1. Duplicate English properties files

Exclude licensing.properties

13

©

How to Localize (cont.)1. Duplicate English properties files

Exclude licensing.properties

2. Rename duplicated files to your Locale

NO

14

©

How to Localize (cont.)1. Duplicate English properties files

Exclude licensing.properties

2. Rename duplicated files to your Locale

3. Date Format Edit format.properties file

15

©

How to Localize (cont.)1. Duplicate English properties files

Exclude licensing.properties

2. Rename duplicated files to your Locale

3. Date Format Edit format.properties file

4. Translate per GUI page Avoids possible legacy strings Use Text Editor that supports global search

16

©

Thank You!

17

Recommended