43
ICANN IDN TLD Variant Issues Project Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) [email protected]

ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

ICANN IDN TLD Variant Issues Project

Presentation to the Unicode Technical Committee

Andrew Sullivan (consultant) [email protected]

Text Box
L2/11-426
Page 2: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

I’m a consultant Blame me for mistakes here,

not staff or ICANN

2  

Page 3: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Background

•  DNS labels were always in (a subset of) ASCII

•  Lots of people don’t normally use ASCII

•  Internationalized Domains Names for Applications (IDNA) invented to help

3  

Page 4: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Reminder: two flavours

IDNA2003

IDNA2008

4  

Page 5: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Basic problem

•  IDNA (2003 & 2008) expands DNS label repertoire

•  The LDH pattern does not fit perfectly in other languages, scripts, or both

•  People want DNS labels to work like parts of natural language

5  

Page 6: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

What makes a DNS label?

•  DNS labels are octets

•  Preferred syntax (RFC 1035) is Letters, Digits, and Hyphen (“LDH”)

•  Special DNS rule for ASCII

•  Case insensitive but case-preserving

6  

Page 7: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

IDNA

•  Permit non-LDH characters in label

•  Be as compatible as practical with deployed software

•  No changes to deployed DNS software or protocol

7  

Page 8: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

IDNA2003

•  Provide a list of code points that are allowed

•  Map cases that are troublesome (e.g. ZWNJ, upper-to-lowercase) using Nameprep

•  To the extent there’s an installed base, this is it

8  

Page 9: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

IDNA2008

•  Attempt to address some perceived limitations of IDNA2003

•  Permits or disallows code points based on code point properties

•  Certain incompatibilities with IDNA2003

9  

Page 10: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

What’s a variant?

Exactly

10  

Page 11: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Origins of variants

•  Starts because of Simplified Chinese/Traditional Chinese issue

•  JET Guidelines (RFC 3743)

•  Became model for other issues, not always related

11  

Page 12: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Things people have claimed

•  Characters that are substitutable

•  “Same words” or “same meaning”

•  Sometimes a constraint on child names, sometimes not

12  

Page 13: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Why now?

•  ccTLD IDN “Fast Track” process delegated some

•  Not uncontroversial

•  New gTLDs under development

•  If we’re going to create “variants”, we should be able to say what they are.

13  

Page 14: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

IDN Variant Issues Project

14

Page 15: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

IDN Variant Issues Project

15  

We are here

{  

Page 16: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Comment period to 14 Nov http://www.icann.org/en/announcements/announcement-4-03oct11-en.htm

and

h.p://www.icann.org/en/public-­‐comment/  

16  

Page 17: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Reports are only about the root

While some of the conclusions may apply to other types of zones, the reports discuss variants for TLDs only

17  

Page 18: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

A planned constraint for TLDs

Current rule is “only letters” (strictly, General Category {Ll, Lo, Lm, Mn})

•  No numerals

•  No HYPHEN-MINUS

•  No ZWNJ/ZWJ

18  

From the guidebook

Page 19: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Restrictions suggested in report

•  No combining marks •  No digits •  No archaic •  No Quranic marks

19  

Arabic team

Page 20: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

ZWNJ

•  Arguments for and against •  Refinement of IDNA2008

context rule •  Issue is lack of shape change

•  Questions about resulting variants

20  

Arabic team

Page 21: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Groups of characters

•  Identical shape at some position (e.g. YEH)

•  Similar shape at some position (e.g. ALEF w/ HAMZA ABOVE)

•  Interchangeable use (e.g. KAF vs SWASH KAF)

21  

Arabic team

Page 22: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

“NFC” issues

•  Not exactly issue with NFC •  Example: U+06C7 vs.

U+0648,U+064F •  Perhaps could be caught by

“confusables” algorithms?

22  

Arabic team

Page 23: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Recommendations

•  Whenever there is a variant, all resulting labels are available to the applicant

•  It is up to the applicant which ones to activate

23  

Arabic team

Page 24: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Focus on Chinese Language

•  Reports in principle about “script”, but report primarily about Chinese

•  Some consideration of effects on Japanese and Korean

24  

Chinese team

Page 25: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

RFC 3743, experience

•  Experience at other levels of DNS

•  RFC 3743 a good fit for CJK use

25  

Chinese team

Page 26: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Two fundamental cases

•  Traditional vs Simplified •  Variation due to Source

Separation Rule (e.g. U+6237 versus U+6236)

26  

Chinese team

Page 27: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Focus on reducing confusion

•  Mainly interested in confusion of strings between languages

•  Unlike Chinese and Arabic, no strong recommendation that “everything works”

27  

Cyrillic team

Page 28: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Different from other cases

•  Many more languages than some other scripts

•  Extremely fraught political environment: •  Cyrillic vs. Latin •  Cyrillic vs. Arabic •  Many spelling & character

reforms

28  

Cyrillic team

Page 29: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

One language can cause issues

•  Substitutions in one language obliterate differences in others

•  E.g. U+0435 vs U+0451, U+0433 vs U+0491

•  Some characters not on keyboards

29  

Cyrillic team

Page 30: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Interaction with other scripts

•  Issue of relation to Greek and Latin raised

•  Declared out of scope, but problematic

30  

Cyrillic team

Page 31: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Very different issues

•  Confusing similarity a high priority issue

•  Especially worried about URL bar display

•  Concern about ill-formed akshars

31  

Devanagari team

Page 32: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Environment issues

•  Display of Devanagari script can be problematic •  Rendering engines •  Fonts

32  

Devanagari team

Page 33: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

ZWJ and ZWNJ

•  Some Devanagari-using languages rely on ZWJ •  Even if there is a

precomposed version that will do

•  ZWNJ needed for noun paradigms •  Use in TLDs not clear

33  

Devanagari team

Page 34: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Inter-script issues

•  Relationship between Devanagari and other Bramhi-derived scripts?

•  Ruled out of scope, but may be important

34  

Devanagari team

Page 35: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Unusual case

•  Greek alone in studied scripts in being used for only one language

35  

Greek team

Page 36: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Additional restrictions

•  Team recommends excluding ancient characters

•  Team recommends sticking to Monotonic characters

36  

Greek team

Page 37: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Sigma and Tonos

•  IDNA2003 maps upper case to lower case: Tonos can be lost

•  IDNA2003 maps away final form sigma

•  Transformations in applications in IDNA2008

37  

Greek team

Page 38: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Final sigma

•  Recommend registering final form sigmas wherever requested

•  Also register without the final sigma (i.e. with small sigma in place of final sigma)

38  

Greek team

Page 39: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Tonos

•  Recommend registering with Tonos where requested

•  Also register with Tonos stripped

39  

Greek team

Page 40: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Dimotiki and Katharevousa

•  Recommendation that, if Katharevousa string is requested, the “same” Dimotiki “word” is blocked

•  Only report that requests variant behaviour because of whole-string meaning

40  

Greek team

Page 41: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

The impossible dream

•  There are too many relationships among characters in Latin-using languages

•  There’s no way to decide •  Therefore, no variants

41  

Latin team

Page 42: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Remember, please comment

Open until 14 November

h.p://www.icann.org/en/public-­‐comment/  

42  

Page 43: ICANN IDN TLD Variant Issues Project › L2 › L2011 › 11426-icann-presentation.pdf · Became model for other issues, not always related . 11. ... New gTLDs under development

Questions

43