Click here to load reader

Issues with SignWriting in Unicode 8

Embed Size (px)

Citation preview

  1. 1. Issues with SignWriting in Unicode 8 Prepared for UTC # 144 / L2 # 241 (July 27-31, 2015) a Unicode Technical Committee meeting in Redmond, WA by Stephen E Slevinski Jr in association with the Center for Sutton Movement Writing
  2. 2. My Background Bachelor of Science in Mathematics Raised two kids with sign language Started collaboration with Valerie Sutton from 2004 until today Complete symbol encoding model on PUA Plane 16 (37,811 characters) Complete script encoding model on PUA Plane 15 (1,179 characters) Argued with Unicode in 2011 and then walked away Released the ISWA 2010 symbol set in 2010 Finalized Formal SignWriting in ASCII on Jan 12, 2012 5 Years of stability with the symbol set and fonts design 3 1/2 Years of stability with the character encoding models Involved with dozens of sign languages around the world Foundation for all online use and modern publishing efforts
  3. 3. SignWriting in Software All major SignWriting editors and viewers are compatible. SignPuddle Online Primary source of written sign language Delegs Editor Educational software from Germany for bilingual education. SignWriter Studio General purpose SignWriting editor, integrated dictionary, and printing. SWift SignWriting improved fast transcriber that aims to simplify the editing process. JSPad SignWriting editor for Japanese sign language based in the Gifu University. Tunisigner interact with SignWriting notations through a 3D virtual signer able to reproduce the exact gestures represented within the sign language transcription. SignTyp a linguistic coding system developed by Rachel Channon through an NSF grant that is being integrated with SignWriting.
  4. 4. http://www.signbank.org/signmaker.html Code Breakdown 0 KB 7 KB 14 KB 21 KB 28 KB 35 KB 42 KB 49 KB 56 KB 63 KB 70 KB Configuration Support Libraries Custom HTML, JS, and CSS SignMaker 2015 Cross-browser, drag-and-drop sign editor, with dictionary and advanced sign searching SignWriting in Software
  5. 5. Bookmarklet Javascript-based SignWriting Keyboard Keyboarding editing has returned to SignWriting Wikimedia Incubator The keyboard editor is enabled on Wikimedia Incubator for the American Sign Language Wikipedia and every other sign language project. Store JavaScript in a bookmark and you can use SignWriting on any web page in any text fields. Any Website Add a few KB of JavaScript and the keyboard editor can be enabled on any website using standard edit boxes and visual presentation. http://www.signwriting.org/symposium/presentation0041.html SignWriting in Software
  6. 6. What about Unicode? PUA Plane 15 design (1,179 characters) The symbol only design removed 2-D layout by dropping 5 structural markers and 500 number characters N4015 Preliminary Unicode (674 characters) N4090 Revised Unicode (672 characters) N4342 Unicode Proposal (672 characters) A new inherent design removes 2 characters (F1 and R1) and breaks collation as stated in proposal A new facial diacritic design is proposed that is unsupported and untested The original design is still compatible with the community efforts.
  7. 7. Issues with SignWriting in Unicode 8 The Unicode 8 specification will not be used for any SignWriting project around the world. The Unicode 8 specification for SignWriting is politically valuable, but unhelpful for developers.
  8. 8. Issues with SignWriting in Unicode 8 The issue of the moment is sorting, but there are three main issues. If we address all of the issues for SignWriting, the existing International community of SignWriters is ready, able, and willing to embrace the standard.
  9. 9. Issue 1: Unicode 8 is incomplete http://signbank.org/SignWriting_Character_Viewer.html Unicode 8 only encodes the symbols and ignores the issue of layout. Unicode 8 is missing the structural markers and number characters required for 2-D Layout. Unicode 8 requires SVG for the visual presentation. Unicode 8 requires additional characters/markup to write a sign.
  10. 10. Issue 2: Unicode 8 is flawed The idea of Inherent characters breaks from the community use of today and historically. Because of Inherent modifiers, sorting is broken, searching is ambiguous, and replacements can be destructive. w s P Symbol Bases Tokens i oSymbol Modifiers Tokens identified with a string of 3 tokens. w i o Writing Symbol P i o Punctuation Symbol Fill Rotation Triadic Symbol
  11. 11. Issue 2: Unicode 8 is flawed Sorting is broken 1D800 SIGNWRITING HAND-FIST INDEX (HFI) 1DAA1 SIGNWRITING ROTATION MODIFIER-2 (R2) 1DA9B SIGNWRITING FILL MODIFIER-2 (F2) 1. HFI F1 R1 5. HFI F1 R1 HFI F1 R1 2. HFI F1 R2 6. HFI F1 R2 HFI F1 R1 3. HFI F2 R1 7. HFI F2 R1 HFI F1 R1 4. HFI F2 R2 1. HFI 5. HFI HFI 3. HFI F2 7. HFI F2 HFI 4. HFI F2 R2 2. HFI R2 6. HFI R2 HFI Correct sorting with F1 & R1 Incorrect sorting without F1 & R1 http://www.unicode.org/L2/L2015/15184-signwriting-ducet.txt http://signpuddle.net/15184-signwriting-ducet-response.txt http://www.unicode.org/L2/L2015/15202-signwriting-ducet-aux.txt
  12. 12. Issue 2: Unicode 8 is flawed Sorting is broken 1D800 SIGNWRITING HAND-FIST INDEX (HFI) 1DAA1 SIGNWRITING ROTATION MODIFIER-2 (R2) 1DA9B SIGNWRITING FILL MODIFIER-2 (F2) HFI weight of 100 F2 weight of 420 R2 weight of 410 1. HFI 100 5. HFI HFI 100 100 2. HFI R2 100 410 6. HFI R2 HFI 100 410 100 3. HFI F2 100 420 7. HFI F2 HFI 100 420 100 4. HFI F2 R2 100 420 410 DUCET Fix Correct sorting with DUCET 1, 5, 2, 6, 3, 7, 4 Correct Sort Order 1, 2, 3, 4, 5, 6, 7 Incorrect Sort Order
  13. 13. Issue 2: Unicode 8 is flawed Searching is ambiguous 1D800 SIGNWRITING HAND-FIST INDEX (HFI) 1DAA1 SIGNWRITING ROTATION MODIFIER-2 (R2) 1DA9B SIGNWRITING FILL MODIFIER-2 (F2) 1. HFI F1 R1 5. HFI F1 R1 HFI F1 R1 2. HFI F1 R2 6. HFI F1 R2 HFI F1 R1 3. HFI F2 R1 7. HFI F2 R1 HFI F1 R1 4. HFI F2 R2 1. HFI 5. HFI HFI 3. HFI F2 7. HFI F2 HFI 4. HFI F2 R2 2. HFI R2 6. HFI R2 HFI Searching with F1 & R1 Searching without F1 & R1 Searching for the symbol HFI F1 R1 correctly finds 4 matches Searching for the symbol HFI incorrectly finds 10 matches without negative lookaheads
  14. 14. Issue 2: Unicode 8 is flawed Searching is ambiguous Query String: QS10000S20500 Searching for signs that include 2 exact symbols will return these results from the ASL Dictionary.
  15. 15. Issue 2: Unicode 8 is flawed Searching is ambiguous Plus 6 more pages of signs. Query String: QS100uuS205uu In Unicode 8, searching for a symbol base without fill or rotation modifiers will return 6 times as much noise as signal.
  16. 16. Issue 2: Unicode 8 is flawed Replacements can be destructive sub uFD830 uFD810 uFD820 by S10000; sub uFD830 uFD810 uFD821 by S10001; sub uFD830 uFD810 uFD822 by S10002; sub uFD830 uFD810 uFD823 by S10003; sub uFD830 uFD810 uFD824 by S10004; sub uFD830 uFD810 uFD825 by S10005; sub uFD830 uFD810 uFD826 by S10006; sub uFD830 uFD810 uFD827 by S10007; sub u1DA8B u1DAA7 by S38b07; sub u1DA8B u1DAA6 by S38b06; sub u1DA8B u1DAA5 by S38b05; sub u1DA8B u1DAA4 by S38b04; sub u1DA8B u1DAA3 by S38b03; sub u1DA8B u1DAA2 by S38b02; sub u1DA8B u1DAA1 by S38b01; sub u1DA8B by S38b00; https://github.com/Slevinski/signwriting_2010_tools The TrueType Fonts use Ligatures to support multiple character sets. Plane 15 Characters Unicode 8 Characters Increasing symbols keys or decreasing works without issue. Decreasing symbol keys to avoid destruction.
  17. 17. Issue 3: Unicode 8 is fictional Facial diacritics do not exist. There is no font support, no software support, and no data. Facial diacritics are described in one document, using 177 words. Facial diacritics have never been tested on any individual, let alone an international group. Facial expressions are created using overlap and overlay of many symbols using Cartesian coordinates for each. Facial diacritics should be handled in software rather than the character encoding. Facial diacritics development was quietly abandoned the end of 2012.
  18. 18. Formal SignWriting Regular Expressions Query Strings Community Use SVG PUA Plane 15 Graphite Font Unicode 8 PUA Plane 16TTF 10% to 50% reduction 15 to 50 times expansion process million of characters per second search results 15 times expansion single character per symbolligatures of 1 to 3 characters twice the size cartesian coordinates with GPOS CSS style text Isomorphic JS ASCII Lite Markup preferredunused prototype 6 KB zipped
  19. 19. AS18711S20500 M514x517S18711490x483S20500486x506 AS18711S20500M514x517S18711490x483S20500486x506 A S18711 S20500 M514x517 S18711490x483 S20500486x506 M 514x517 S18711 490x483 S20500 486x506 (514,517) (490,483) (486,506) Time Space Sequence Marker Symbol Middle Lane SignBox Max Coord Spatial Symbol Community Use Formal SignWriting Standard ASCII format is Isomorphic to PUA Plane 15
  20. 20. Unicode 9 Regular Expressions Query Strings Ideal Solution Graphite Font TTF 10% to 50% reduction 15 to 50 times expansion process million of characters per second search results cartesian coordinates with GPOS CSS style text http://signpuddle.net/iswa/#smartfont Prototype Font uses Cartesian coordinates for 2-D layout with Graphite JS 6 KB zipped
  21. 21. Too Late? SignWriting is spreading around the world and exploding online. All of the SignWriting projects are using an ASCII solution and have no plans to switch to the Unicode 8 design for the symbols. Without a full script solution for SignWriting, Unicode will not be used for SignWriting, especially the Unicode 8 design which complicates otherwise simple routines. Using Unicode for SignWriting is a great idea in theory, but there are few advantages and too many disadvantages to seriously consider applying the Unicode 8 design, even if sorting is fixed. I left the Unicode effort the end of 2011. In 2012, I was shown the latest proposal (N4342). I objected privately and asked that they produce a working font before they contact me again. In 2014, I was contacted that SignWriting will be in Unicode 8. I reiterated my objections, pointing out the issues, and was told it was too late to change the design in any way.
  22. 22. Discussion Ideas 2-Color Fonts SignWriting relies on a 2-color font. Currently, SignWriting mimics a 2-color font by using 2 TrueType Fonts: one for the line and another for the filling. If you have any experience with 2-color fonts, lets discuss the possibilities. 2-Dimensional Layout with Graphite and Cartesian coordinates SignWriting has a prototype font that uses Cartesian coordinates to control the 2-dimensional layout with Graphite and PUA Plane 15 characters. If you have any experience with 2-dimensional layout using Cartesian coordinates, lets discuss the possibilities. Alternate designs for a 2-dimensional script This type of discussion is interesting, but it will not effect the SignWriting community. The standards are stable and widely used. This would make for an interesting project, but it is not work that I will be doing myself.
  23. 23. Discussion Ideas Unicode 9 or 10 Can we deprecate Unicode 8? The community design has been stable for 3 1/2 years. There is an interested community and there are many possibilities for 2-Color fonts and 2-Dimensional layout. Unicode 8 I will not be using Unicode 8. I partially support Unicode 8 with the SignWriting 2010 Fonts, but not the facial diacritics. I suggested that people avoid use SignWriting in Unicode 8. Im willing to discuss any of the 3 issues that I have outlined, but Im not invested in any tweaks to the Unicode 8 design. Symbol Encoding Model PUA Plane 16 (37,811 characters) Script Encoding Model PUA Plane 15 (1,179 characters) both designs are productive and used today
  24. 24. Issues with SignWriting in Unicode 8 by Stephen E Slevinski Jr http://slevinski.github.io [email protected] http://www.slideshare.net/StephenSlevinski/presentations