Upload
trinhnhu
View
237
Download
1
Embed Size (px)
Citation preview
Electronic Document PreparationPocket Primer
Viacutet Novotnyacute
September 20 2016
Contents
Introduction 1
1 Writing 311 Text Processing 4111 Character Encoding 4112 Text Input 12113 Text Editors 13114 Interactive Document Preparation Systems 13115 Regular Expressions 1412 Version Control 17
2 Markup 2121 Meta Markup Languages 22211 The General Markup Language 22212 The Extensible Markup Language 2322 Markup on the World Wide Web 28221 The Hypertext Markup Language 28222 The Extensible Hypertext Markup Language 29223 The Semantic Web and Linked Data 3123 Document Preparation Systems 32231 Batch-oriented Systems 35232 Interactive Systems 3624 Lightweight Markup Languages 39
3 Design 4131 Fonts 4132 Structural Elements 42321 Paragraphs and Stanzas 42
iv CONTENTS
322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48
Bibliography 51
Acronyms 59
Index 63
Introduction
With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition
This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study
The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer
This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use
Chapter 1
Writing
The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation
The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics
On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Contents
Introduction 1
1 Writing 311 Text Processing 4111 Character Encoding 4112 Text Input 12113 Text Editors 13114 Interactive Document Preparation Systems 13115 Regular Expressions 1412 Version Control 17
2 Markup 2121 Meta Markup Languages 22211 The General Markup Language 22212 The Extensible Markup Language 2322 Markup on the World Wide Web 28221 The Hypertext Markup Language 28222 The Extensible Hypertext Markup Language 29223 The Semantic Web and Linked Data 3123 Document Preparation Systems 32231 Batch-oriented Systems 35232 Interactive Systems 3624 Lightweight Markup Languages 39
3 Design 4131 Fonts 4132 Structural Elements 42321 Paragraphs and Stanzas 42
iv CONTENTS
322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48
Bibliography 51
Acronyms 59
Index 63
Introduction
With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition
This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study
The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer
This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use
Chapter 1
Writing
The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation
The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics
On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
iv CONTENTS
322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48
Bibliography 51
Acronyms 59
Index 63
Introduction
With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition
This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study
The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer
This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use
Chapter 1
Writing
The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation
The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics
On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Introduction
With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition
This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study
The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer
This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use
Chapter 1
Writing
The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation
The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics
On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use
Chapter 1
Writing
The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation
The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics
On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
4 CHAPTER 1 WRITING
Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht
flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg
usw
Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined
setting as well making them an indispensable reference on theeditorial tradition
Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around
11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section
111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 5
ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings
tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale
ASCII
In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans
The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i
bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation
bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values
bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
6 CHAPTER 1 WRITING
7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del
Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]
Code point range Encoding0ndash127 0
128ndash2047 110 102048ndash65535 1110 10 10
65536ndash1114111 11110 10 10 10
Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary
Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000
Table 13 An example of the utf-8 encoding
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 7
bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]
Eight-bit Encodings
With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001
Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing
bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i
bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables
bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification
bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings
bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
8 CHAPTER 1 WRITING
Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which
bring Unicode sup-port to protocols
that were designedwith the seven-
bit asci i in mindsuch as e-mail
A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary
The Universal Character Set and Unicode
In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange
ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa
Three major encodings are specified in the ucs standard andits amendments [8 9]
1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4
2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character
3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings
餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨
1
餐甑逞扉牙慨
1
Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants
are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13
utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]
Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines
Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
10 CHAPTER 1 WRITING
Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent
iconv -f latin2 -t utf8 -- oldtxt gt newtxt
Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt
bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings
bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium
bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]
bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]
bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 11
Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system
Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ
Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
12 CHAPTER 1 WRITING
Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute
Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal
112 Text Input
To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical
An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 13
113 Text Editors
A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico
More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing
114 Interactive Document Preparation Systems
Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]
The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
14 CHAPTER 1 WRITING
Mastering RegularExpressions [19] byJeffrey E F Friedl
is an extensiveresource on regexes
proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress
115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text
Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems
More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes
Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
11 TEXT PROCESSING 15
bre regex Description Matcheswe12p The repetition expression in the form of
119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898
for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898
weeps wept
ene Star () is a repetition operator equivalent to theinterval expression of 0
never enemyKleene
(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex
⟨regex⟩
^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string
argumentarrow keys
ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string
iron oredumbledore
be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square
brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity
beehivegrizzly bearglass beads
be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match
obeah bendlibela
^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character
^$
()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched
ara araraunadardanellesnationality
Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)
ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are
repetition operators equivalent to the intervalexpressions of 1 and 01
personapeer speechperl
(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that
separates multiple regexes The whole regexmatches any of the alternative regexes
one twotrophy truth
()1 eres do not support backreferences ⟨undefined⟩
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
16 CHAPTER 1 WRITING
Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias
property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩
Property DescriptionLetter This property is satisfied by any letterPunctua-
tion
This property is satisfied by any punctuation
Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong
to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs
block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing
system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric
Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩
Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive
The authoritativeresource on grep
sed and awk isSed amp awk [21]
which explains eachprogram as well asthe bre and ere syn-taxes in full detail
least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15
The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
12 VERSION CONTROL 17
The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook
language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems
12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool
vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)
By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar
Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
18 CHAPTER 1 WRITINGAfter a remote
repository has beenestablished users
download the latestversion of the
document and thenkeep downloading
the latest changes byother users and
uploading changesof their own
svnadmin create
svncheckout
svnupdate
svncommit
Figure 18 The basic svn workflow
An example wouldbe the graphical
svn client Tortoisesvn that is able to
display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-
face provided byMicrosoft Office
ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown
git init
gitclone
gitpull
gitpush
git reset git commit
Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs
svnadmin create
gitsvnclone
gitsvnrebase
gitsvn
dcommit
git reset git commit
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
20 CHAPTER 1 WRITING
Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)
Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Chapter 2
Markup
Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure
It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought
Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
22 CHAPTER 2 MARKUP
More informationabout the project
can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First
Published Hint [24]
The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-
dard bearing exten-sive annotations
the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible
21 Meta Markup Languages
211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language
This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]
sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
21 META MARKUP LANGUAGES 23
A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org
XMLSchema
Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd
212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd
This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)
Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
24 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltrecipegt
ltnamegtPalatschinkenltnamegt
ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt
ltingredientList serves=8gt
ltingredient amount=120ggtPlain flourltingredientgt
ltingredient amount=2gtEggltingredientgt
ltingredient amount=300mlgtMilkltingredientgt
ltingredient amount=1 tblspngtOilltingredientgt
ltingredient amount=1 pinchgtSaltltingredientgt
ltingredientListgt
ltstepListgt
ltstepgtCombine the ingredients and whisk until
you have a smooth batterltstepgt
ltstepgtHeat oil on a pan pour in a tablespoonful
of the batter fry until golden brownltstepgt
ltstepgtRepeat until there is no batter leftltstepgt
ltstepgtServe rolled and filled with jamltstepgt
ltstepListgt
ltrecipegt
Figure 21 An example xml document (recipexml)
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtdgt
ltDOCTYPE recipe SYSTEM recipedtdgt
ltDOCTYPE recipe [
ltELEMENT recipe (name description ingredientList
stepList)gt
ltELEMENT name (PCDATA)gt
ltELEMENT description (PCDATA)gt
ltELEMENT ingredientList (ingredient+)gt
ltATTLIST ingredientList serves CDATA REQUIREDgt
ltELEMENT ingredient (PCDATA) gt
ltATTLIST ingredient amount CDATA REQUIREDgt
ltELEMENT stepList (step+) gt
ltELEMENT step (PCDATA)gt ]gt
ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES
httpwwwexamplecomDTDrecipedtd [
lt-- Omitted for brevity --gt ]gt
ltDOCTYPE recipe SYSTEM recipedtd [
lt-- Omitted for brevity --gt ]gt
Figure 22 An example dtd
element recipe
element name text
element description text
element ingredientList
attribute serves xsdpositiveInteger
element ingredient
attribute amount text text
+
element stepList
element step text +
Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
26 CHAPTER 2 MARKUP
ltxml version=10 encoding=UTF-8gt
ltschema xmlns=httpwwww3org2001XMLSchemagt
ltelement name=recipegtltcomplexTypegtltallgt
ltelement name=name type=string minOccurs=1gt
ltelement name=description type=string
minOccurs=1gt
ltelement
name=ingredientListgtltcomplexTypegtltsequencegt
ltelement name=ingredient minOccurs=1
maxOccurs=unboundedgt
ltcomplexTypegtltsimpleContentgt
ltextension base=stringgt
ltattribute name=amount type=stringgt
ltextensiongt
ltsimpleContentgtltcomplexTypegt
ltelementgtltsequencegt
ltattribute name=serves type=positiveInteger
use=requiredgt
ltcomplexTypegtltelementgt
ltelement name=stepListgtltcomplexTypegtltsequencegt
ltelement name=step type=string minOccurs=1
maxOccurs=unboundedgt
ltsequencegtltcomplexTypegtltelementgt
ltallgtltcomplexTypegtltelementgt
ltschemagt
Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)
xmllint -noout --dtdvalid recipedtd recipexml
xmllint -noout --schema recipexsd recipexml
trang recipernc reciperng Compact -gt Full Relax NG
xmllint -noout --relaxng reciperng recipexml
Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
21 META MARKUP LANGUAGES 27
A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml
Speech
AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying
VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not
lt(V)linegt
lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt
lt(S)speech who=PeergtNo Im notlt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=AasegtWell then
swear to me its truelt(S)speechgt
lt(V)linegtlt(V)linegt
lt(S)speech who=PeergtSwear why should Ilt(S)speechgt
lt(S)speech who=AasegtSee you dare not
lt(V)linegtlt(V)linegt
Every word of its a lielt(S)speechgt
lt(V)linegt
Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
28 CHAPTER 2 MARKUP
The authoritativeresource on the Doc-Book xml formatis DocBook 5 The
Definitive Guide [34]The book itself iswritten in Doc-
Book and its sourcecode is publiclyavailable at http
docbookorg
The Postelrsquos lawstates that one
should be conser-vative in what they
send but liberalin what they ac-
cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols
schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser
Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22
22 Markup on the World Wide Web
221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web
The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
22 MARKUP ON THE WORLD WIDE WEB 29
JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers
an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27
Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup
During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed
222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as
ltbgtBold ltigtbold and italicltbgt italicltigt
ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt
Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
30 CHAPTER 2 MARKUP
ltfont face=Verdana size=4gt
ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt
ltbrgtltbrgtThere is a continuing need to show the power of
ltigtCSSltigt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The ltigtHTML
ltigt remains the same the only thing that has changed
is the external ltigtCSSltigt file Yes really
ltfontgt
Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css
ltstylegt
body
font large Verdana
font-size large
h1
font-size x-large
text-transform uppercase
abbr
font-style italic
ltstylegt
lth1gtSo what is this aboutlth1gt
ltpgtThere is a continuing need to show the power of
ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire
and encourage participation To begin view some of the
existing designs in the list Clicking on any one will
load the style sheet into this very page The
ltabbrgtHTMLltabbrgt remains the same the only thing that
has changed is the external ltabbrgtCSSltabbrgt file Yes
reallyltpgt
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
22 MARKUP ON THE WORLD WIDE WEB 31
The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]
exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces
The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web
223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web
An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
32 CHAPTER 2 MARKUP
A list of ontologiesthat are fully doc-umented honorthe current bestpractices and
are supported byvarious tools canbe found on the
w3c wiki at httpwwww3orgwiki
Good_Ontologies
If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900
Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist
rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting
23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
23 DOCUMENT PREPARATION SYSTEMS 33
ltxml version=10 encoding=UTF-8gt
ltrdfRDF xmlnsrdf=httpwwww3org19990222-
rdf-syntax-ns
xmlnsdc=httppurlorgdcterms
xmlnsfoaf=httpxmlnscomfoaf01gt
ltrdfDescription
rdfabout=httpexampleorgdocumenthtmlgt
ltdctitle xmllang=engtJohns Web pageltdctitlegt
ltdccreator
rdfresource=httpexampleorgjohn-smithgt
ltrdfDescriptiongt
ltrdfDescription
rdfabout=httpexampleorgjohn-smithgt
ltrdftype rdfresource=foafPersongt
ltfoafnamegtJohn Smithltfoafnamegt
ltrdfDescriptiongt
ltrdfRDFgt
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermstitlegt Johns Web pageen
lthttpexampleorgdocumenthtmlgt
lthttppurlorgdctermscreatorgt
lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
lthttpwwww3org19990222-rdf-syntax-nstypegt
lthttpxmlnscomfoaf01Persongt
lthttpexampleorgjohn-smithgt
lthttpxmlnscomfoaf01namegt John Smith
prefix foaf lthttpxmlnscomfoaf01gt
prefix dc lthttppurlorgdcelements11gt
lthttpexampleorgdocumenthtmlgt
dctitle Johns Web pageen
dccreator lthttpexampleorgjohn-smithgt
lthttpexampleorgjohn-smithgt
a foafPerson
foafname John Smith
Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
34 CHAPTER 2 MARKUP
ltDOCTYPE htmlgt
lthtml lang=engt
ltheadgt
ltlink rel=meta type=applicationrdf+xml
href=johnrdfgt
ltlink rel=meta type=textturtle href=johnttlgt
ltlink rel=meta type=applicationn-triples
href=johnntgt
lttitlegtJohns Web pagelttitlegt
ltheadgt
ltbodygt
Hi Im John Smith
ltbodygt
lthtmlgt
Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language
ltDOCTYPE htmlgt
lthtml lang=engt
lthead vocab=httppurlorgdcterms
about=httpexampleorgdocumenthtmlgt
lttitle property=title lang=engtJohns Web
pagelttitlegt
ltmeta property=creator
href=httpexampleorgjohn-smithgt
ltheadgt
ltbody vocab=httpxmlnscomfoaf01
about=httpexampleorgjohn-smith
typeof=Persongt
Hi Im ltspan property=namegtJohn Smithltspangt
ltbodygt
lthtmlgt
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
23 DOCUMENT PREPARATION SYSTEMS 35
httpexampleorgdocumenthtml
Johns Web pageen
dctitle
httpexampleorgjohn-smith
foafPersonrdftype
John Smith
foafname
foafcreator
Figure 211 A graph of the rdf document in Figure 29
categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses
231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
36 CHAPTER 2 MARKUP
The circumstancesthat led to the cre-
ation of TEX and thesurrounding tools
are thoroughly doc-umented in Digital
Typography [52]
standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics
Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents
232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve
Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
23 DOCUMENT PREPARATION SYSTEMS 37
The Cask of Amontilladoby
Edgar Allen Poe
T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul
will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser
-1-
TITLE The Cask of Amontillado
AUTHOR Edgar Allen Poe
PRINTSTYLE TYPESET
PAGE 6i 9i 75i 75i 75i 75i
START
PP
DROPCAP T 3
he thousand injuries of Fortunato I had borne as I best
could but when he ventured upon insult I vowed revenge
You who so well know the nature of my soul will not
suppose however that gave utterance to a threat
[IT]At length[PREV] I would be avenged this was a
point definitely settled[em]but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresser
Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
38 CHAPTER 2 MARKUP
Page geometry
pdfpagewidth=6in pdfpageheight=9in
Page dimensions
hsize=dimexprpdfpagewidth-15in
vsize=dimexprpdfpageheight-15in
baselineskip=168pt
hoffset=-25in voffset=-25in
Fonts
fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt
fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt
Logical markup definition
deftitle1bigbfcenterline1
defauthor1itcenterlinebycenterline1
vskip 39em
defchapter1noindentsmashhskip01exlower58ex
hboxllapdropcap1hskip-03ex
parshape=4 3emdimexprhsize-3em 328em
dimexprhsize-328em 328em
dimexprhsize-328em 0emhsize
The document
titleThe Cask of Amontillado
authorEdgar Allen Poe
chapter The thousand injuries of Fortunato I had borne
as I best could but when he ventured upon insult I vowed
revenge You who so well know the nature of my soul
will not suppose however that gave utterance to a
threat it At length I would be avenged this was a
point definitely settled---but the very definitiveness
with which it was resolved precluded the idea of risk I
must not only punish but punish with impunity A wrong is
unredressed when retribution overtakes its redresserbye
Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
24 LIGHTWEIGHT MARKUP LANGUAGES 39
Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)
24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Chapter 3
Design
After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices
31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
42 CHAPTER 3 DESIGN
terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result
A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]
Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families
32 Structural Elements
321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]
Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
32 STRUCTURAL ELEMENTS 43
ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo
1
documentclass[11pt]article
usepackagefontspec leading newunicodechar
usepackage[Latin Greek]ucharclasses
setTransitionsForLatin
fontspecAlegreyaSans-Regularttf[Ligatures=TeX]
setTransitionsForGreek
fontspecGFSNeohellenicotf[Scale=12 WordSpace=05
Ligatures=TeX]
newunicodecharraisebox8ex
frenchspacing
leading14pt
begindocument
The second function of Soul -- knowing -- was not at
first distinguished from motion Aristotle says φαμὲν
γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι
δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα
δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν
αὐτὴν κινεῖσθαι
``The soul is said to feel pain and joy confidence and
fear and again to be angry to perceive and to think
and all these states are held to be movements which
might lead one to suppose that soul itself is moved
enddocument
Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
44 CHAPTER 3 DESIGN
ltstylegt
font-face
font-family Alegreya Sans
src url(AlegreyaSans-Regularttf)
format(truetype)
unicode-range U+00-24F U+1E00-1EFF U+2000-206F
U+2C60-2C7F U+A720-A7FF U+FB00-FB4F
font-face
font-family GFS Neohellenic
src url(GFSNeohellenicotf) format(opentype)
unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF
U+102E0-102FF
p
font-family Alegreya Sans GFS Neohellenic
sans-serif
line-height 14pt
[lang=en]
font-size 11pt
[lang=gr]
font-size 132pt
ltstylegt
ltpgtltspan lang=engtThe second function of Soul ndash knowing
ndash was not at first distinguished from motion Aristotle
says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν
λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί
τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα
κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν
κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to
feel pain and joy confidence and fear and again to be
angry to perceive and to think and all these states
are held to be movements which might lead one to suppose
that soul itself is movedrdquoltspangtltpgt
Figure 32 The document from Figure 31 reformulated in html5and css3
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
32 STRUCTURAL ELEMENTS 45
line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)
Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]
If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]
Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style
322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
46 CHAPTER 3 DESIGN
Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1
2 times 11 1 ∶ 1294 12941
Table 31 An overview of commonpaper sizes used for commercialand industrial printing
This is a side-note Sidenotesenliven the pageand are easy for
the reader to find
variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book
The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]
323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]
324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms
1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
32 STRUCTURAL ELEMENTS 47
2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]
3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find
Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form
325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]
1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks
2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]
This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced
1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing
A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
48 CHAPTER 3 DESIGN
obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star
mdashWilliam Shakespeare King Lear
Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations
33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams
The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]
In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]
34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader
The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
34 COLOR 49
Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487
(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Bibliography
[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)
[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963
(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character
set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)
[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)
[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)
[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)
[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
52 BIBLIOGRAPHY
Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)
[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)
[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)
[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)
[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_
encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version
900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html
(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url
httpcldrunicodeorg (visited on 09172016) (cit onp 10)
[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)
[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)
[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
BIBLIOGRAPHY 53
[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)
[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)
[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)
[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html
(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second
Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)
[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)
[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)
[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)
[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)
[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
54 BIBLIOGRAPHY
[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)
[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)
[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)
[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)
[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)
[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)
[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)
[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
BIBLIOGRAPHY 55
[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)
[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)
[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)
[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)
[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH
ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)
[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)
[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)
[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)
[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
56 BIBLIOGRAPHY
[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)
[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)
[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)
[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)
[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)
[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)
[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)
[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)
[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)
[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
BIBLIOGRAPHY 57
[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)
[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)
[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Acronyms
ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
60 ACRONYMS
dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
ACRONYMS 61
ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
62 ACRONYMS
xml The eXtensible Markup Language
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
Index
ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42
Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17
sect
Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
bre 14ndash16bs 6bsd 13
sect
ca 52can 6cern 28
character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44
sect
dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35
dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36
sect
ebcdic 5ecma 55Edgar Allen Poe 37
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
64 INDEX
Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15
ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5
sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in
the Origins of Western Speculation 43fs 6fsm 35
sectGit 17gml 22gnuLinux 13nano 13
gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35
sectHan Unification 9heading 45Henrik Ibsen 27ht 6
html 28ndash32 34 39 44 55sect
ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54
sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment
sectKing Lear 48
sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46
sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36
mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39
sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14
Noam Chomsky 14note 46Notepad++ 13Notepad 13
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
INDEX 65
nroff see troffnul 6ny 51
sectocr 12odf 13ooxml 13owl 32 56
sectparagraphblock 47indented 45outdented 45
paragraph 42paragraphsblock 45
pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8
sectQuarkXPress 14quotationblock 47run-in 47
sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31
rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6
sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22
sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-
lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6
secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff
man 36
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23
66 INDEX
me 36mom 36
troff 35tron 9Turtle 32 33typeface 41
sectucsblock 8ucs-4 8
ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10
us 6usa 51 52utf
utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8
utf 6 8ndash10 52sect
VBScript 29vcscentralized 17decentralized 17
vcs 17ndash20version control 13vi 13vim 13
vt 6sect
w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4
wysiwyg 35sect
XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23
xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23