Upload
lorena-reynolds
View
221
Download
0
Embed Size (px)
Citation preview
Agenda: Guidelines for Supporting Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000Complex Scripts* on Windows 2000
Key ConceptsKey Concepts Overview of UnicodeOverview of Unicode Migrating existing applicationsMigrating existing applications Using Unicode text in resourcesUsing Unicode text in resources
*Such as Devanagari and Tamil*Such as Devanagari and Tamil
DefinitionsDefinitions Enabling for a script:Enabling for a script:
Adding support for input, display, Adding support for input, display, and output of the scriptand output of the script
Localization:Localization:Translating user interface elementsTranslating user interface elements
Globalization:Globalization:Developing software such that feature Developing software such that feature design and code design are not limited design and code design are not limited to a single locale or scriptto a single locale or script
Requirements for Enabling Requirements for Enabling Indian Scripts in Applications Indian Scripts in Applications on Windows 2000:on Windows 2000:
Use Unicode to encode textUse Unicode to encode text Enable for complex scriptsEnable for complex scripts
Note: Many Microsoft products do Note: Many Microsoft products do not yet meet these requirements. not yet meet these requirements. However, we’re working on it!However, we’re working on it!
Overview of UnicodeOverview of Unicode
Character Set EvolutionCharacter Set Evolution
MS-DOS: OEM character setsMS-DOS: OEM character sets Windows 3.x: ANSI character setsWindows 3.x: ANSI character sets Windows 9x: ANSI character setsWindows 9x: ANSI character sets Windows NT: Windows NT:
UnicodeUnicode Supported for Compatibility: Supported for Compatibility:
OEM (console) character sets, OEM (console) character sets, ANSI character sets, ANSI character sets,
Why do character set Why do character set differences matter?differences matter?
Historically, they fragmented code Historically, they fragmented code bases for both Windows and bases for both Windows and applicationsapplications Single byte: European editionsSingle byte: European editions Double byte: Far East editionsDouble byte: Far East editions Bi-directional: Middle East editionsBi-directional: Middle East editions
Make it difficult to share dataMake it difficult to share data Make it difficult to develop multilingual Make it difficult to develop multilingual
applicationsapplications
What is Unicode?What is Unicode?
A 16-bit character encodingA 16-bit character encoding A mapping of characters to numbersA mapping of characters to numbers Syntax rules for display of complex scriptsSyntax rules for display of complex scripts Not a font or glyph encoding!Not a font or glyph encoding! Not a sort algorithm!Not a sort algorithm!
Includes all characters in common use Includes all characters in common use in modern scripts (and others)in modern scripts (and others)
Basis for the ISO 10646 character Basis for the ISO 10646 character encoding standardencoding standard
Native text encoding for Windows NTNative text encoding for Windows NT
UnicodeUnicode™™ / ISO / ISO 1064610646
16-16-bit international bit international character encodingcharacter encoding
Windows 2000 uses Windows 2000 uses Unicode version 2.0 Unicode version 2.0
0x0000
0xFFFF
PunctuationPunctuation
Future useFuture use
ASCIIASCII
Private usePrivate use
CompatibilityCompatibility
IndianIndian
GreekGreek
Arabic, HebrewArabic, Hebrew
LatinLatin
IdeographsIdeographs(Hanzi, Kanji, (Hanzi, Kanji, Hanja)Hanja)
SymbolsSymbols
HangulHangulKanaKana
ThaiThai
A0041 9662 FF96 4F85 0000((null)null)
Relatives of UnicodeRelatives of Unicode
ISO/IEC 10646ISO/IEC 10646 32 bit ISO standard of 64K X 64K “planes”32 bit ISO standard of 64K X 64K “planes” Unicode repertoire is plane 0Unicode repertoire is plane 0
UTF-7UTF-7 7 bit transformation format7 bit transformation format Not widely usedNot widely used
UTF-8 UTF-8 8 bit transformation format8 bit transformation format Used in web pages and some emailUsed in web pages and some email
Why Should I Use Unicode Why Should I Use Unicode and Win32 for Indian Text?and Win32 for Indian Text?
MMy application works fine
y application works fine now!now!
MMy application works fine
y application works fine now!now!
????
Benefits of Using Unicode Benefits of Using Unicode on Windows 2000on Windows 2000
Share data (e.g., cut and paste) with Share data (e.g., cut and paste) with other Win32 applicationsother Win32 applications
Make use of full Win32 API for text Make use of full Win32 API for text processingprocessing
Support multilingual documents, Support multilingual documents, including multiple Indian scriptsincluding multiple Indian scripts
Use industry standard encodingUse industry standard encoding
Summary: Use Unicode –Summary: Use Unicode –It is the ultimate character It is the ultimate character encodingencoding
Represent all text with Represent all text with oneone unambiguous unambiguous encodingencoding
Support multilingual text easilySupport multilingual text easily Avoid special processing for variable byte-Avoid special processing for variable byte-
length characterslength characters Use standard encoding recognized Use standard encoding recognized
throughout the industry and the worldthroughout the industry and the world Support new scripts that are only supported Support new scripts that are only supported
through Unicodethrough Unicode
Migrating Exiting Applications Migrating Exiting Applications to Support Indian Text on to Support Indian Text on Windows 2000…Windows 2000…
Three Migration Scenarios:Three Migration Scenarios:
1.1. ANSI application to UnicodeANSI application to Unicode
2.2. Standard Win32 application to Standard Win32 application to complex script enabledcomplex script enabled
3.3. Existing Indian language Existing Indian language application to Unicode and Win32application to Unicode and Win32
Migrating ANSI applications Migrating ANSI applications to Unicodeto Unicode
Overview of “A” and “W” entry pointsOverview of “A” and “W” entry points How to build a Unicode Win32 How to build a Unicode Win32
ApplicationApplication Unicode Applications on Windows 98Unicode Applications on Windows 98
Review of the W and A APIsReview of the W and A APIs
Two kinds of window classes: Unicode, ANSI Two kinds of window classes: Unicode, ANSI Win32 API has two versions of most functions:Win32 API has two versions of most functions:
““W” (wide) version handles UnicodeW” (wide) version handles Unicode ““A” (ANSI – A” (ANSI – ) assumes the system default code ) assumes the system default code
page (character encoding)page (character encoding)
Macros resolve to W or A entry pointMacros resolve to W or A entry point Example: Macro for RegisterClassExExample: Macro for RegisterClassEx
#ifdef UNICODE#ifdef UNICODE
#define RegisterClassEx RegisterClassExW#define RegisterClassEx RegisterClassExW
#else#else
#define RegisterClassEx RegisterClassExA#define RegisterClassEx RegisterClassExA
#endif#endif
To Build a Unicode-enabled To Build a Unicode-enabled Application:Application:
Automatic in Visual Studio:Automatic in Visual Studio: Compile with options –DUNICODE and -D_UNICODECompile with options –DUNICODE and -D_UNICODE Specify WinMainCRTStartup in Specify WinMainCRTStartup in
ProjectSettings/Link/Output/EntryPointSymbolProjectSettings/Link/Output/EntryPointSymbol
Or, use only the “W” routines from Win32 APIOr, use only the “W” routines from Win32 API Metafiles:Metafiles:
Use Extended Metafiles (EMF)Use Extended Metafiles (EMF) Windows Metafiles (WMF) don’t support UnicodeWindows Metafiles (WMF) don’t support Unicode
For Applications that Must For Applications that Must Also Run on Windows 98… Also Run on Windows 98…
Use Unicode everywhere with single Use Unicode everywhere with single binary, two code paths:binary, two code paths: On Windows NT use W entry pointsOn Windows NT use W entry points On Windows 98, convert Unicode On Windows 98, convert Unicode ANSI, ANSI,
use A entry pointsuse A entry points See sample GLOBALDV for exampleSee sample GLOBALDV for example
See April Microsoft Systems Journal See April Microsoft Systems Journal for details and other optionsfor details and other options
Migrating Standard Win32 Migrating Standard Win32 Application to Support Application to Support Complex ScriptsComplex Scripts
Good news: In a Unicode Good news: In a Unicode application, it basically just works!application, it basically just works!
Simple, Simple, PlainPlain-text -text ApplicationsApplications
Use standard edit control in Visual Use standard edit control in Visual C/C++C/C++
Use standard win32 API functionsUse standard win32 API functions Win32 APIs: ExtTextOutW or DrawTextWWin32 APIs: ExtTextOutW or DrawTextW ScriptString API in UniscribeScriptString API in Uniscribe
Pitfalls in Enabling for Pitfalls in Enabling for Complex ScriptsComplex Scripts When displaying typed text:When displaying typed text:
Do notDo not output characters one by one! output characters one by one! DoDo save text in a buffer and display the save text in a buffer and display the
whole string with Uniscribe or Win32 APIwhole string with Uniscribe or Win32 API
To measure line lengths:To measure line lengths: Do notDo not sum cached character widths sum cached character widths DoDo use a GetTextExtent function or use a GetTextExtent function or
UniscribeUniscribe
Simple Applications With Simple Applications With FormattedFormatted Text Text
Use rich edit Use rich edit control in Visual control in Visual C/C++C/C++
Internet Explorer 5.0: Use Internet Explorer 5.0: Use Document Object Model Document Object Model (more later)(more later)
Applications With Applications With Advanced Advanced FormattingFormatting and Layout and Layout
Use Use scriptscript APIs (“Uniscribe”) APIs (“Uniscribe”) See MSJ article of November 1998See MSJ article of November 1998
What about Visual Basic, What about Visual Basic, Visual J++?Visual J++? Visual Basic 6.0Visual Basic 6.0
Standard controls are ANSI, not UnicodeStandard controls are ANSI, not Unicode Use “MS Forms 2.0” controls to use Use “MS Forms 2.0” controls to use
Unicode in controlsUnicode in controls Resource editor Resource editor doesdoes support Unicode support Unicode
Visual J++Visual J++ Resource editor supports UnicodeResource editor supports Unicode Text Output is ANSI onlyText Output is ANSI only
Future Plans: Make Unicode work Future Plans: Make Unicode work everywhere in Visual Studioeverywhere in Visual Studio
Migrating Existing Indian Migrating Existing Indian language applications to language applications to Win32 and UnicodeWin32 and Unicode
Step 1 in Migrating Existing Step 1 in Migrating Existing Indian ApplicationsIndian Applications
Follow guidelines for Unicode enabling Follow guidelines for Unicode enabling and complex script enablingand complex script enabling
Step 2 in Migrating Existing Step 2 in Migrating Existing Indian Applications …Indian Applications … Provide conversion facility to migrate Provide conversion facility to migrate
documentsdocuments From your format to ISCIIFrom your format to ISCII From ISCII to UnicodeFrom ISCII to Unicode
MultiByteToWideChar(<codepage>, …MultiByteToWideChar(<codepage>, … Devanagari is codepage 57002Devanagari is codepage 57002 Tamil is codepage 57004Tamil is codepage 57004
See UCONVERT sampleSee UCONVERT sample Included on your CDIncluded on your CD Modified from UCONVERT in Win32 SDK Modified from UCONVERT in Win32 SDK
Using Unicode Text in Using Unicode Text in ResourcesResources
Getting Unicode into Win32 resourcesGetting Unicode into Win32 resources Multilingual Visual C/C++ applicationsMultilingual Visual C/C++ applications
Getting Unicode into Win32 Getting Unicode into Win32 ResourcesResources
Create Unicode RC fileCreate Unicode RC file Resource editor in Visual Studio does not Resource editor in Visual Studio does not
support Unicode yet, sosupport Unicode yet, so Generate rc file for English using IDEGenerate rc file for English using IDE Translate to target language with Unicode Translate to target language with Unicode
editor (e.g., notepad or Word)editor (e.g., notepad or Word) Save as UnicodeSave as Unicode
Compile with resource compiler RC.EXECompile with resource compiler RC.EXE RC.EXE RC.EXE doesdoes support Unicode support Unicode Compile within Visual Studio IDECompile within Visual Studio IDE
Implementing Multilanguage Implementing Multilanguage User Interface in ApplicationsUser Interface in Applications
Use satellite resource DLLsUse satellite resource DLLs Default to user settings, butDefault to user settings, but Allow user to changeAllow user to change For details, see:For details, see:
April 1999 Microsoft System JournalApril 1999 Microsoft System Journal GLOBALDV sample codeGLOBALDV sample code
Multilanguage User InterfaceMultilanguage User Interface Initialize to current UI languageInitialize to current UI language
Windows 2000: Windows 2000: GetUserDefaultUILanguage()GetUserDefaultUILanguage()
Others: Use the language of the O/SOthers: Use the language of the O/S
Allow user to select UI languageAllow user to select UI language Put language-dependent resources in Put language-dependent resources in
resource DLLsresource DLLs Use naming convention, e.g., Use naming convention, e.g.,
res<LANGID>.dllres<LANGID>.dll Find all resource DLLs, put up list box of Find all resource DLLs, put up list box of
choiceschoices
Agenda: Using Unicode and Agenda: Using Unicode and Complex Scripts in Enterprise Complex Scripts in Enterprise Applications Applications Intranet/internet applicationsIntranet/internet applications Unicode support in SQL Server 7.0Unicode support in SQL Server 7.0 Other ConsiderationsOther Considerations
Intranet/Internet Intranet/Internet ApplicationsApplications
Internet Explorer 5.01 on Win32 PlatformsInternet Explorer 5.01 on Win32 Platforms Displays multilingual text including complex Displays multilingual text including complex
scriptsscripts Supports complex scripts in Document Object Supports complex scripts in Document Object
ModelModel Supports Indian text through UnicodeSupports Indian text through Unicode
Encodings for Multi-lingual Encodings for Multi-lingual Text in Web PagesText in Web Pages Raw UnicodeRaw Unicode
OK for intranet on Windows NT networksOK for intranet on Windows NT networks Not good for internet pagesNot good for internet pages
Number entities, e.g., कNumber entities, e.g., क OK for occasional use, e.g., inserting characters OK for occasional use, e.g., inserting characters
not in the main script of pagenot in the main script of page Not good for large documentsNot good for large documents
UTF-8 – Recommended encodingUTF-8 – Recommended encoding Works just about everywhereWorks just about everywhere Supported by IE 4.0+, Netscape 4.0+Supported by IE 4.0+, Netscape 4.0+
Creating UTF-8 WebpagesCreating UTF-8 Webpages Use charset=UTF-8 in META tagUse charset=UTF-8 in META tag Save HTML page as UTF-8 using Save HTML page as UTF-8 using
notepad, Word, etc.notepad, Word, etc. Saving as UTF-8 in Word:Saving as UTF-8 in Word:
Select File/Save As WebPage/ToolsSelect File/Save As WebPage/Tools Select Web Options/EncodingSelect Web Options/Encoding Change charset designation to UTF-8Change charset designation to UTF-8
Embedded Fonts in Web Embedded Fonts in Web PagesPages
Downloadable fonts used only in web Downloadable fonts used only in web pagespages
Deleted when page is closedDeleted when page is closed WEFT toolWEFT tool
Creates embedded font from TTF fileCreates embedded font from TTF file Saves download time/space by using only Saves download time/space by using only
those glyphs required for the pagethose glyphs required for the page On Microsoft website, see On Microsoft website, see
workshop/author/workshop/author/fontembedfontembed/font_embed.asp/font_embed.asp
Introduction to DHTMLIntroduction to DHTML
Based on Document Object ModelBased on Document Object Model Objects in HTML documentObjects in HTML document
Text in objects including titles, headers, etcText in objects including titles, headers, etc Attributes such as font, color, etcAttributes such as font, color, etc
Are accessible via scripts, e.g., JScript or Are accessible via scripts, e.g., JScript or VBScriptVBScript
Supported in IE 4.0+Supported in IE 4.0+
See various documents under See various documents under www.microsoft.com/workshop/authorwww.microsoft.com/workshop/author for for overviewoverview
Examples of DHTMLExamples of DHTML<H1 id=Head1 style=“font-weight: normal”<H1 id=Head1 style=“font-weight: normal” onmouseover = “makeitalic() ;”onmouseover = “makeitalic() ;” onmouseout = “makenormal() ;” >onmouseout = “makenormal() ;” > Sample Dynamic HTML </H1>Sample Dynamic HTML </H1>
<script language=JavaScript><script language=JavaScript> function makeItalic() {function makeItalic() {
Head1.style.fontstyle = “Italic” ;Head1.style.fontstyle = “Italic” ;}}function makeNormal() {function makeNormal() {
Head1.style.fontstyle = “Normal” ;Head1.style.fontstyle = “Normal” ;}}</script></script>
Heading tagHeading tag
Jscript Jscript functions that functions that change style of change style of heading textheading text
Using Indian Scripts in Using Indian Scripts in DHTMLDHTML
Use same design rules as static HTMLUse same design rules as static HTML Encode in UTF-8Encode in UTF-8 Use embedded fonts if neededUse embedded fonts if needed
Consider multilingual pagesConsider multilingual pages Display initial page in English Display initial page in English Offer option to change to otherOffer option to change to other
Unicode Support in SQL Unicode Support in SQL Server 7.0Server 7.0
Unicode datatypes in SQL Server 7.0Unicode datatypes in SQL Server 7.0 NCHARNCHAR NVARCHARNVARCHAR NTEXTNTEXT Indicate Unicode text by N’text’, in SQL queries:Indicate Unicode text by N’text’, in SQL queries:
create table myTable (col1 CHAR(8), col2 NCHAR(8))create table myTable (col1 CHAR(8), col2 NCHAR(8))
insert into myTable (col1,col2) (‘Japan’, N‘insert into myTable (col1,col2) (‘Japan’, N‘ 日本日本 ')')
Utilities for entering/retrieving Unicode data:Utilities for entering/retrieving Unicode data: Query AnalyzerQuery Analyzer Data Transformation ServicesData Transformation Services Client application using ODBCClient application using ODBC
Accessing Data Through Accessing Data Through ODBCODBC
ODBC supports Unicode data accessODBC supports Unicode data access Use Visual C/C++ for read/writeUse Visual C/C++ for read/write
Use SQL ‘W’ routines, e.g., Use SQL ‘W’ routines, e.g., SQLExecDirectSQLExecDirectWW(SQLHSTMT, LPWSTR, int);(SQLHSTMT, LPWSTR, int);
Specify data type SQL_C_WCHAR as needed:Specify data type SQL_C_WCHAR as needed:SQLBindCol(hstmt, nColumn, SQLBindCol(hstmt, nColumn, SQL_C_WCHARSQL_C_WCHAR, , szCol, nMaxCol, &cbName);szCol, nMaxCol, &cbName);
See GLOBALDV sampleSee GLOBALDV sample Use Visual Basic to retrieve and displayUse Visual Basic to retrieve and display
Accessing SQL Server 7.0 Accessing SQL Server 7.0 Unicode Data through ASP Unicode Data through ASP WebpagesWebpages
Use standard encodings:Use standard encodings: UTF-8 in web pagesUTF-8 in web pages Unicode in SQL Server 7.0Unicode in SQL Server 7.0
Access data through Jscript/ODBCAccess data through Jscript/ODBC Jscript automatically translates Unicode to Jscript automatically translates Unicode to
current codepage in web page current codepage in web page Defaults to system codepageDefaults to system codepage Specify UTF-8 “codepage” using:Specify UTF-8 “codepage” using:
<%Session.CodePage=65001%> // Scope=session<%Session.CodePage=65001%> // Scope=session <%@CODEPAGE=65001%><%@CODEPAGE=65001%> // Scope=page// Scope=page
Summary of SQL Server 7.0 Summary of SQL Server 7.0 Unicode AccessUnicode Access
Tool Storage Retrieval Notes and Restrictions
Enterprise Manager
No No Uses the Tabular viewer tool, which is ANSI based
Query Analyzer
Yes Yes
Data Transformation Services
Yes Yes Import/export file format must support Unicode
ODBC Yes Yes SQL queries through ODBC process Unicode correctly. Must use Unicode APIs and datatypes.
Visual Basic Limited Yes Must use MS Forms 2.0 controls to display properly in Visual Basic. Cannot enter Indian text in text box
JScript in Web page
? Yes Can retrieve and display Indian Text in UTF-8 web page using Jscript. Storage not yet tested.
Other Considerations …Other Considerations …
Handling Indian text in network Handling Indian text in network applicationsapplications Indic Language Group must be installed Indic Language Group must be installed
on clientson clients Only necessary on server if display and Only necessary on server if display and
input is required locallyinput is required locally
Sharing DocumentsSharing Documents Word 2000 Documents: Must have Indic Word 2000 Documents: Must have Indic
language group installed on local machinelanguage group installed on local machine HTML: Can use embedded fontsHTML: Can use embedded fonts
Break!Break!
OpenType LayoutOpenType Layout
David C. BrownDavid C. BrownDevelopment Lead, andDevelopment Lead, and
David MeltzerDavid MeltzerProgram ManagerProgram Manager
Microsoft CorporationMicrosoft Corporation
OpenType LayoutOpenType Layout
File FormatFile Format Benefits of OpenTypeBenefits of OpenType Layout FeaturesLayout Features Indic FeaturesIndic Features
OpenType File FormatOpenType File Format
sfnt table structuresfnt table structure Extension of the current TrueType file Extension of the current TrueType file
formatformat
A single font file may containA single font file may contain TrueType outline dataTrueType outline data PostScript (CFF) outline dataPostScript (CFF) outline data
Benefits of OpenTypeBenefits of OpenType
Support for large character setsSupport for large character sets Multi-script character setsMulti-script character sets Unicode supportUnicode support Glyph alternates supportedGlyph alternates supported Advanced typography supportedAdvanced typography supported Better protection of font dataBetter protection of font data Font embedding controlsFont embedding controls
Layout FeaturesLayout Features
Glyph substitutionGlyph substitution Glyph positioningGlyph positioning Script and Language informationScript and Language information
Glyph SubstitutionGlyph Substitution
Single glyph substitutionSingle glyph substitution One-to-many substitutionOne-to-many substitution Multiple glyph substitutionMultiple glyph substitution Aesthetic alternativesAesthetic alternatives Contextual glyph substitutionContextual glyph substitution
Glyph PositioningGlyph Positioning
Two-dimensional positioningTwo-dimensional positioning Single glyph adjustmentSingle glyph adjustment Adjustment of paired glyphsAdjustment of paired glyphs Cursive attachmentCursive attachment Mark attachmentMark attachment Contextual positioningContextual positioning
Script and Language Script and Language InformationInformation
Layout features encoded byLayout features encoded by ScriptsScripts Languages within scriptsLanguages within scripts
Indic FeaturesIndic Features
Language FormsLanguage Forms Conjuncts and Typographical FormsConjuncts and Typographical Forms Glyph PositioningGlyph Positioning
Language FormsLanguage Forms
NuktaNukta AkhandAkhand RephReph Below-base FormBelow-base Form Half FormHalf Form Post-base FormPost-base Form Vattu VariantsVattu Variants
Example: Below-base formExample: Below-base form
VVaattttuu ((BBeellooww--bbaasseeffoorrmm ooff RRaa))
Conjuncts and Conjuncts and Typographical FormsTypographical Forms
Pre-base substitutionsPre-base substitutions Below-base substitutionsBelow-base substitutions Above-base substitutionsAbove-base substitutions Post-base substitutionsPost-base substitutions Halant FormsHalant Forms
Example: Pre-base Example: Pre-base consonant conjunctconsonant conjunct
Glyph PositioningGlyph Positioning
Below-base marksBelow-base marks Above-base marksAbove-base marks Distance controlDistance control
Coming Tools for Coming Tools for Developing OpenType FontsDeveloping OpenType Fonts
VTT (Visual TrueType)VTT (Visual TrueType) VOLT (Visual OpenType Layout Tool)VOLT (Visual OpenType Layout Tool)
Installing Sample Fonts …Installing Sample Fonts …
copy …\cssamp\fonts.exe c:\tempcopy …\cssamp\fonts.exe c:\temp cd c:\tempcd c:\temp fonts /T:c:\temp /Cfonts /T:c:\temp /C Use explorer to drag mangal.ttf and Use explorer to drag mangal.ttf and
latha.ttf into your winnt\fonts latha.ttf into your winnt\fonts directory.directory.
ResourcesResources
OpenType SpecificationOpenType Specification http://www.microsoft.com/typography/http://www.microsoft.com/typography/
otspecotspec
Indic Encoding SpecificationIndic Encoding Specification Early draft available on your CDEarly draft available on your CD contact contact [email protected]@microsoft.com