70
Electronic Document Preparation Pocket Primer Vít Novotný September 20, 2016

Electronic Document Preparation Pocket Primer

Embed Size (px)

Citation preview

Page 1: Electronic Document Preparation Pocket Primer

Electronic Document PreparationPocket Primer

Viacutet Novotnyacute

September 20 2016

Contents

Introduction 1

1 Writing 311 Text Processing 4111 Character Encoding 4112 Text Input 12113 Text Editors 13114 Interactive Document Preparation Systems 13115 Regular Expressions 1412 Version Control 17

2 Markup 2121 Meta Markup Languages 22211 The General Markup Language 22212 The Extensible Markup Language 2322 Markup on the World Wide Web 28221 The Hypertext Markup Language 28222 The Extensible Hypertext Markup Language 29223 The Semantic Web and Linked Data 3123 Document Preparation Systems 32231 Batch-oriented Systems 35232 Interactive Systems 3624 Lightweight Markup Languages 39

3 Design 4131 Fonts 4132 Structural Elements 42321 Paragraphs and Stanzas 42

iv CONTENTS

322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48

Bibliography 51

Acronyms 59

Index 63

Introduction

With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition

This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study

The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer

This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use

Chapter 1

Writing

The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation

The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics

On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 2: Electronic Document Preparation Pocket Primer

Contents

Introduction 1

1 Writing 311 Text Processing 4111 Character Encoding 4112 Text Input 12113 Text Editors 13114 Interactive Document Preparation Systems 13115 Regular Expressions 1412 Version Control 17

2 Markup 2121 Meta Markup Languages 22211 The General Markup Language 22212 The Extensible Markup Language 2322 Markup on the World Wide Web 28221 The Hypertext Markup Language 28222 The Extensible Hypertext Markup Language 29223 The Semantic Web and Linked Data 3123 Document Preparation Systems 32231 Batch-oriented Systems 35232 Interactive Systems 3624 Lightweight Markup Languages 39

3 Design 4131 Fonts 4132 Structural Elements 42321 Paragraphs and Stanzas 42

iv CONTENTS

322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48

Bibliography 51

Acronyms 59

Index 63

Introduction

With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition

This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study

The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer

This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use

Chapter 1

Writing

The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation

The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics

On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 3: Electronic Document Preparation Pocket Primer

iv CONTENTS

322 Headings 45323 Tables and Lists 46324 Notes 46325 Quotations 4733 Page Layout 4834 Color 48

Bibliography 51

Acronyms 59

Index 63

Introduction

With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition

This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study

The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer

This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use

Chapter 1

Writing

The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation

The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics

On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 4: Electronic Document Preparation Pocket Primer

Introduction

With the advent of the digital age typesetting has become availableto virtually anyone equipped with a personal computer Beautifultext documents can now be crafted using free and consumer-gradesoftware which often obviates the need for the involvement ofa professional designer and typesetter The level playing field ofthe Internet coupled with the rising popularity of digital-onlydocuments then allows the author to bypass the publisher as wellif they so wish without jeopardizing their chance of recognition

This aim of this book is to provide a general overview of thetools and techniques tied with writing designing typesettingand distributing text documentsmdashone of the principal means ofknowledge preservation and transfer known to man Each chapterdescribes one discrete step of document preparation along withpractical examples and references to literature for those interestedin further study

The chapter are filled with examples that illustrate the sub-ject matter These should be consulted whenever the conceptsdescribed in the text are unclear to the reader Although care wastaken not to favor any computing environment some examplesfeature utilities for Unix and Unix-like operating systems Theseutilities may or may not have a suitable counterpart in operatingsystems such as Windows To try the corresponding examples outthe reader is advised to install a free Unix-like environmentmdashsuchas Cygwin for Windowsmdashon their computer

This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use

Chapter 1

Writing

The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation

The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics

On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 5: Electronic Document Preparation Pocket Primer

This documentwas prepared inaccordance withWilliam StrunkrsquosElements of Style anAmerican Englishstyle guide forgeneral use

Chapter 1

Writing

The essence of a document is the idea it represents In the case ofa text document this idea is articulated through speech whichis transcribed using text optionally accompanied by figures andthen laid out on a sheet of paper according to a design Sincethe text is typically independent on the design whose task is tosupport and elicit the internal structure of the text it is writingthat is the logical first step in the text document creation

The essentials of writing in any given natural language includegrammar rules which specify the structure of spoken languageand orthographic rules which impose additional requirements onwritten text The complexity of either set of rules depends entirelyon the language in question Some writing systems such as thosethat incorporate Chinese characters are not phonographic andthe correspondence between the spoken words and the writtensymbols needs to be memorized by the writer on a word-to-wordbasis Other languages may use vastly different grammar rulesfor speaking and for writing which means that a spoken sentenceneeds to be translated first before writing down A writer needsto recognize these specifics

On top of grammar and orthographic rules stand style guideswhich in order to improve consistency codify how common lan-guage patterns are encoded More comprehensive style guidesmdashsuch as the Chicago Manual of Style or the Oxford Style Manualmdashoftengo beyond writing and provide guidelines on design and type-

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 6: Electronic Document Preparation Pocket Primer

4 CHAPTER 1 WRITING

Zwei Trichter wandeln durch die NachtDurch ihres Rumpfs verengten Schacht

flieszligt weiszliges Mondlichtstill und heiterauf ihrenWaldweg

usw

Figure 11 Exceptions that prove the rule about the separation oftext and design can sometimes be encountered in poetry Above isChristian Morgensternrsquos Trichter where the text and its form areintimately intertwined

setting as well making them an indispensable reference on theeditorial tradition

Above all stand the typographic rules which specify how theresulting document should be typeset so that it doesnrsquot disturbthe eye of the reader These as well as the orthographic rules onhyphenation can be left out of consideration during writing as itis the page that should be formed around the writing and not theother way around

11 Text ProcessingOriginally the domain of the pen the quill the stylus and themorerecent typewriter machine manuscripts of today are producedmainly using the personal computer and stored in text files Thediscipline of creating and manipulating digital text is called textprocessing and will be the focus of this section

111 Character EncodingAlthough computing at its most primal has no use for anythingbut numbers it has nevertheless been accompanied by text fromthe very outset Even the earliest computers from 1950s were pro-grammed with both raw machine code and the text programminglanguage of the FORmula TRANslator (fortran) The digital repre-sentation of letters digits and other characters was initially closely

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 7: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 5

ebcdic by ibmwas the defaultencoding on ibmrsquosSystem360 main-frames and wasin active use untilthe introduction ofpc in 1981 In writ-ing systems usingChinese charactersspecial encodingssuch as Big5 j isand euc are used tothis day For brevitythe text focuses onthe main streamof internationalencodings

tied to each specific application and processor architecture butwith the advent of computer networking in 1960s mutual intelli-gibility became a point of concern ldquoWe had over sixty differentways to represent characters in computers It was a real Tower ofBabelrdquo explains Bob Berner [1] an American computer scientistwho worked at ibm during 1956ndash1962 and who drafted the Ameri-can Standard Code for Information Interchange (asci i) [2]mdasha characterencoding from 1963 that unified the digital representation of textacross the computer industry and enabled computer networkingon a large scale

ASCII

In asci i every character is represented by a number from zeroto 127 which is transformed to a seven-bit integer called a char-acter code These 128 codes are used to encode printable charac-tersmdashspanning the letters of the English alphabet digits punctua-tion and other symbolsmdashand control codes as depicted in Table11 Unlike printable characters control codes have no fixed vis-ual representation and they were used to implement application-specific communication protocols and text formatting their precisesemantics were defined in a much later standard from 1972 [3]Unconstrained by the bandwidth and the storage limitations ofthe 1960s and 1970s todayrsquos communication protocols and textformats gravitate towardsmarkup constructed fromprintable char-acters which unlike control codes are easy to read and write byhumans

The followingpropertiesmake it easy tomanipulate and reasonabout character strings encoded in asci i

bull Each character is represented by exactly seven bits This makesit easy to allocate space for character strings of fixed length tomeasure the number of characters stored in a memory region andto perform basic operations such as adjacent character retrievalor text truncation

bull Characters are alphabetically ordered Character strings can there-fore be collated by comparing character code binary values

bull Lowercase and uppercase letters digits and control codes formcontiguous ranges of character codes This simplifies classification

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 8: Electronic Document Preparation Pocket Primer

6 CHAPTER 1 WRITING

7 0 0 0 0 1 1 1 16 Bits 0 0 1 1 0 0 1 15 0 1 0 1 0 1 0 14 3 2 1 Ctrl codes Symbols Upper case Lower case0 0 0 0 nul dle 0 P lsquo p0 0 0 1 soh dc1 1 A Q a q0 0 1 0 stx dc2 rdquo 2 B R b r0 0 1 1 etx dc3 3 C S c S0 1 0 0 eot dc4 $ 4 D T d t0 1 0 1 enq nak 5 E U e u0 1 1 0 ack syn amp 6 F V f v0 1 1 1 bel etb rsquo 7 G W g w1 0 0 0 bs can ( 8 H X h x1 0 0 1 ht em ) 9 I Y i y1 0 1 0 lf sub J Z j z1 0 1 1 vt esc + q K [ k 1 1 0 0 ff fs lt L l |1 1 0 1 cr gs - = M ] m 1 1 1 0 so rs gt N ^ n ~1 1 1 1 si us O _ o del

Table 11 The asci i encoding as specified in the 1986 revision ofthe standard [4]

Code point range Encoding0ndash127 0

128ndash2047 110 102048ndash65535 1110 10 10

65536ndash1114111 11110 10 10 10

Table 12 The utf-8 encoding Each represents one bit of the ucscode point in binary

Character Code point encodingŘ 344 101011000 11000101 10011000e 101 1100101 01100101č 269 100101000 11000100 10101000

Table 13 An example of the utf-8 encoding

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 9: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 7

bull There is precisely one way to encode any printable character Theconversion between the lower- and uppercase letters is a matter ofinverting one bitThis comes at the expense of support for non-English writingsystems As a temporary workaround a set of asci i derivativesthat replaced the less-needed characters of $ [ ] ^ lsquo | and ~for international characters was specified in the iso 646 standardfrom 1972 [3]

Eight-bit Encodings

With the byte size stabilizing at eight bits new character encodingsemerged that were based on asci i and used the additional bit toencode characters of non-English writing systems while retainingcomplete backwards compatibility with asci i Beside the numer-ous vendor-specific encodings (called code pages) a set of fifteeneight-bit encodings covering all major modern writing systemswhose characters fit within the space of 128 additional combina-tions was standardized in the i soiec 8859 series released during1986ndash2001

Compared to asci i eight-bit encodings introduced an addi-tional level of complexity to text processing

bull Each character is exactly eight bits wide The manipulation withstrings is therefore as straightforward as with asci i

bull Character strings can no longer be collated by character code com-parison Each encoding requires separate collation tables

bull Classes of characters such as uppercase and lowercase letters orpunctuation no longer form contiguous ranges and their positionvaries among encodings This impedes character classification

bull Idiosyncrasies such as the ligature of aelig and invisible hyphenationhints are included in several encodings which makes it moredifficult to determine character string equivalence Algorithms forcase conversion vary among encodings

bull There exists no standard mechanism to detect which encoding isbeing used The distinction needs to be done on the applicationlevel using either heuristics additional metadata or human in-tervention Consequently no standard mechanism exists to usedifferent character encodings within a single text document

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 10: Electronic Document Preparation Pocket Primer

8 CHAPTER 1 WRITING

Notable are alsothe seven-bit encod-ings of utf-7 andPunycode which

bring Unicode sup-port to protocols

that were designedwith the seven-

bit asci i in mindsuch as e-mail

A portion of this complexity is inherent in the task of encoding thecharacters of all modern writing systems but the overhead causedby the character encoding fragmentation proved to be unnecessary

The Universal Character Set and Unicode

In the early 1990s the continual increase in the available band-width and storage led to the creation of the standards of Unicode [56] and the Universal multiple-octet coded Character Set (ucs) [7] in anattempt to create a text encoding that would contain the charactersof all the worldrsquos languages and succeed asci i as the lingua francaof text interchange

ucs is an ever-expanding catalogue of characters from writingsystems both modern and ancient and symbols ranging fromdiacritical marks punctuation and ideograms to mahjong tilesalchemical symbols and the ancient Greek musical notation Eachof these characters is assigned a number called a code point rangingfrom 0 to 2147483647 (7F FF FF FF in the hexadecimal notation)with the numbers of the most common characters in the rangefrom 0 to 65535 (FF FF) called the Basic Multilingual Plane (bmp)The smallest unit of division in ucs are blocks which contain 256thematically related characters ucs encodings map code pointsto binary character codes and vise versa

Three major encodings are specified in the ucs standard andits amendments [8 9]

1 utf-32 directly encodes ucs characters by transforming their codepoints to four-byte integers utf-32 is also known as ucs-4

2 utf-16 directly encodes characters within bmp by transformingtheir code points to two-byte integers Code points in the rangefrom 65536 to 1114111 (01 00 00ndash10 FF FF) are transformed intopairs of two-byte integers called surrogate pairs ranging from55296 to 57343 (DC 00ndashDF FF) To enable the utf-16 encoding thecode points in this range will never be assigned to characters [10sec 34 D15] The same is true of code points above 1114111(10 FF FF) which allows utf-16 to encode any ucs character

3 utf-8 directly transforms code points ranging from 0 to 127 (7F)to one-byte integers Since the first ucs block of the bmp matchesasci i any text encoded in eight-bit asci i is also encoded in utf-8Code points in the range from 127 to 1114111 (00 00 7Fndash10 FF FF)

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 11: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 9One of the designgoals of ucs was toavoid assigningcode points todifferent glyphs thatcarry the samemeaning As aresult the visuallydistinctive Hancharacters used inthe East Asiancountries of ChinaJapan Korea andVietnam weremerged into a set of75960 ideograms ina process referred toas the HanUnification [10sec 181] Thissimplifies textprocessing but alsomakes it impossibleto encode a text inmultiple East Asianlanguages withouthaving to rely onexternal markup toselect appropriateregional fonts As aresult a derivativeof ucs that doesnrsquotimplement the HanUnification wasdeveloped for use inoperating systemsbased on theReal-time Operatingsystem Nucleus(tron) and is usedin the East Asiaalongside ucs andregion-specificencodings

餐甑逞扉牙慨餐甑逞扉牙慨餐甑逞扉牙慨

1

餐甑逞扉牙慨

1

Figure 12 Several Han characters in the traditional Chinese Japa-nese Korean and Vietnamese variants

are transformed into two to four one-byte integers ranging from128 to 253 (80ndashFD) The encoding is illustrated in tables 12 and 13

utf-32 is primarily used for the fixed-space internal represen-tation of individual ucs characters inside programs utf-16 fulfillsa similar role in programs that only work with bmp and utf-8 isused for text storage and interchange Since 2010 the majority oftext content on the Web has been encoded in asci i and utf-8 [11]

Unicode was a competing standard for universal text encodingthat underwent a merger with ucs in version 11 and since thenthe standards have been kept closely synchronised Unicode is asuperset of ucs which defines additional information about ucscharactersmdashsuch as their general category directionality case ornumeric value [10 sec 35 and ch 4]mdash various text processingalgorithms and implementation guidelines

Regarding text processing Unicode and ucs represent a com-promise between the simplicity of the seven-bit asci i and theheterogeneity of eight-bit encodings

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 12: Electronic Document Preparation Pocket Primer

10 CHAPTER 1 WRITING

Ǻ = Aring + = A + + Figure 13 Some ucs characters can be either input as a singleentity or composed from several combining characters RegardingUnicode normalization forms all of the above representations arecanonically equivalent

iconv -f latin2 -t utf8 -- oldtxt gt newtxt

Figure 14 Text files can be converted between encodings using theiconv command-line tool The sample code shows the file oldtxtbeing converted from the isoiec 8859-2 encoding to utf-8 Theresult of the conversion is stored in the file newtxt

bull If simple text manipulation is preferred over space efficiency eachcharacter can be made exactly two or four bytes wide using theutf-16 and utf-32 encodings

bull Although character strings can not be collated by a simple charac-ter code comparison a collation algorithm is defined in the Uni-code specification [12] and collation tables for major locales [13]are maintained by the Unicode Consortium

bull Classes of charactersmdashsuch as uppercase letters lowercase lettersnumbers and punctuationmdashdo not form contiguous ranges buttheir position is directly specified in the standard [10 sec 45]

bull Although idiosyncrasiesmdashsuch as ligatures invisible hyphena-tion hints and combining charactersmdashare present in ucs explicitnormalization algorithms for character string equivalence testingare specified by the standard [10 sec 212] An algorithm for caseconversion is also specified [10 sec 313]

bull The byte order mark (FE FF) character can be inserted at thebeginning of a text as a signature of Unicode encodings As thename suggests the order in which the FE and FF bytes arrive alsoindicates the order of bytes (called endianity) that was used toencode integers In utf-32 and utf-16 endianity can be chosenarbitrarily by the encoding application In utf-8 one-byte integersare used and the notion of endianity is therefore meaningless

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 13: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 11

Figure 15 Text input methods are not limited to keyboard layoutsSoftware that enables the input of non-Latin characters on a key-board through reversed romanization can often be the best optionfor writing systems with a large number of characters Above isthe Google Pinyin input method for the Android operating sys-tem which makes it possible to input Chinese characters usingthe pinyin phonetic system

Compose + O + R = regCompose + 3 + 4 = frac34Compose + s + s = szligCompose + ~ + rsquo + a = ấ

Figure 16 The Compose key followed by a mnemonic sequence ofasci i characters produces a ucs character Although originally aphysical key Compose is not available on modern pc and Applekeyboards and is usually mapped to the right Ctrl or Super keyin software Compose is natively supported on Unix and Unix-likeoperating systems using the XWindowSystemOn other operatingsystems support can be added by third-party software

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 14: Electronic Document Preparation Pocket Primer

12 CHAPTER 1 WRITING

Alt + 1 + 6 + 0 = aacuteAlt + 0 + 2 + 2 + 5 = aacuteAlt + + + E + 1 = aacute

Figure 17 On the Windows operating system holding the Alt keyand typing a sequence of numbers produces a character with thecorresponding number fromeither an ibm code page if the numberhas no leading zero or from a Windows code page otherwiseThe code pages vary depending on the current locale in Englishlocales the ibm code page 437 and theWindows code page 1252 areused After a Windows Registry modification it is also possible todirectly produce ucs characters by holding the Alt key and typingthe corresponding ucs code point in hexadecimal

112 Text Input

To insert text into a document it is necessary to use an inputdevice In case of personal computers this is typically a computerkeyboard and a mouse although the ongoing research in the areasof Sound Recognition (sr) and Optical Character Recognition (ocr)makes it possible to use a microphone or a tablet as well On hand-held devices the use of either a numeric keypad or a touch-screenis more typical

An operating system will typically provide one or more inputmethods for each input device through a component commonlyreferred to as the Input Method Editor (ime) The asci i encodingwas developed with typewriters and teleprinters in mind and astheir direct descendant the standard computer keyboard providessupport for all asci i characters This doesnrsquot apply to the muchlarger ucs and it is the task of an ime to provide a mechanismfor the creation and selection of keyboard layouts that will allowthe user to input any ucs character Some programs may provideinput methods of their own that are independent on the ime

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 15: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 13

113 Text Editors

A text editor is an application that can be used to create and modifytext files Entry-level text editors are often distributed with anoperating system and offer little beyond the ability to load modifyand save text files in a text encoding of choice Entry-level texteditorswith aGraphical User Interface (gui) include the free Leafpadfor gnuLinux and the Berkeley Software Distribution (bsd) familyof operating systems and the proprietary Notepad for Windowsand TextEdit for Mac OS Entry-level text editors with a CommandLine Interface (cli) include the free joe gnu nano and pico

More advanced text editors come with the support for regularexpressions and version controlmdashwhich will be covered in sections115 and 12mdashand user modules that extend the base functional-ity Advanced gui text editors include the free Notepad++ andAtom and the proprietary Sublime Text Advanced cli text editorsinclude the free Emacs vi and vim These cli text editors are no-torious for their steep learning curve in exchange they empowerthe users to perform complex text editing

114 Interactive Document Preparation Systems

Interactive Document Preparation Systems (dpses) are a breed of texteditors that produces fully-formatted text documents instead of(or along with) text files The reader is advices to avoid interactivedpses that use proprietary undocumented or obscure file formatswhich lock the user into using the respective dps Well-definedinteractive dps file formats include the Portable Document Format(pdf) [14] the Office Open XML format (ooxml) [15] and the OpenDocument Format for office applications (odf) [16]

The primary difference between text editors and dpses is thefact that the user is expected to use the dps to mark up design andtypeset the resulting text document whereas with plain text filesa multitude of choices is available at each step of the documentpreparation process The self-sufficient nature of dpses may be atime-saving feature for simpler documents but in the case of morecomplex documents the markup and typesetting capabilities of adpsmay not be up to par with those of a dedicated tool Interactivedpses include the free Apache OpenOffice and Scribus and the

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 16: Electronic Document Preparation Pocket Primer

14 CHAPTER 1 WRITING

Mastering RegularExpressions [19] byJeffrey E F Friedl

is an extensiveresource on regexes

proprietary TextEdit Microsoft Word Scribus Adobe InDesignAdobe FrameMaker and QuarkXPress

115 Regular ExpressionsThe Chomsky hierarchy is a classification of text production rulesets (called formal grammars) which was proposed [17] in 1956 bythe American linguist Noam Chomsky in his endeavor to discovera good formal model for the description of natural languages Theclass of regular grammars which is the least powerful of the pro-posed classes and the related formal model of regular expressionsenable the writer to match patterns within text

Since regular expressions are just a formal model a softwareimplementation needs to settle on a concrete syntax One of theearliest standard syntaxes are the Basic Regular Expressions (bre)and the Extended Regular Expressions (ere) syntaxes [18 part 1 ch 9]described in Table 14 which are supported bymost text processingprograms on Unix and Unix-like operating systems

More extensive syntaxes include the gnu extensions of bre andere the regex syntax of the Perl programming language and theirderivatives For these syntaxes the term regular is a misnomer asthey can be used to describe formal grammars that according tothe Chomsky hierarchy are stronger than regular To disambiguatethe term expressions in these syntaxes are often called regexes

Many regex syntaxes and the software that implements themwere designed for the processing of asci i text and may behavein surprising ways when confronted with ucs characters Thesoftware may assume that each character is exactly one byte wideand fail to recognize any character that occupies several bytes Itmay also assume that all ucs characters fall within bmp and exhibitthe same problem with characters outside bmp More subtle butno less precarious can be the lack of support for Unicode caseconversion and normalization algorithms which makes it difficultto perform robust case-insensitive matching and the matchingof characters that can be encoded in several different ways Thelack of awareness of the invisible characters that can appear inucs textmdashsuch as the zero width space (20 0B) zero widthnon-joiner (20 0C) zero width joiner (20 0D) and zero widthno-break space (FE FF)mdash is also problematic and can lead tofalse negative matches Conversely modern regex syntaxes that at

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 17: Electronic Document Preparation Pocket Primer

11 TEXT PROCESSING 15

bre regex Description Matcheswe12p The repetition expression in the form of

119888119898119899matches the character 119888 repeated119896 isin ⟨119898 119899⟩ times Other forms include 119888119898

for 119896 isin ⟨119898 infin) and 119888119898 for 119896 = 119898

weeps wept

ene Star () is a repetition operator equivalent to theinterval expression of 0

never enemyKleene

(⟨regex⟩) A subexpression is a parenthesized regex Anyinterval expression or repetition operator usedimmediately after a subexpression applies tothe entire parenthesized regex

⟨regex⟩

^ar At the beginning of a regex or a subexpressiona caret (^) matches the beginning of a string

argumentarrow keys

ore$ At the end of a regex or a subexpression thedollar sign ($) matches the end of a string

iron oredumbledore

be A period () matches any single character or not to bebe[ea] A matching list expression is enclosed in square

brackets ([ ]) and contains a list of charactersthat the bracket expression matches It maycontain other entities omitted here for brevity

beehivegrizzly bearglass beads

be[^ea] A non-matching list expression contains a caret(^) as its first character and matches anycharacter that the corresponding matching listexpression would not match

obeah bendlibela

^$ Backslash () is an escape character that eithersuppresses or activates the special meaning ofthe following character

^$

()1 A backreference in the form of an escapednumber 119899 isin ⟨1 9⟩ (1 2 hellip 9) matchesanything the 119899th subexpression matched

ara araraunadardanellesnationality

Table 14 An informal description of the bre syntax (above) andthe differences in the ere syntax (below)

ere regex Description Matcheswe12p Unlike in bres braces arenrsquot escaped weeps weptpe+rl The plus sign (+) and the question mark () are

repetition operators equivalent to the intervalexpressions of 1 and 01

personapeer speechperl

(⟨regex⟩) Unlike in bres parentheses arenrsquot escaped ⟨regex⟩(on|t) Vertical line (|) is an alternation operator that

separates multiple regexes The whole regexmatches any of the alternative regexes

one twotrophy truth

()1 eres do not support backreferences ⟨undefined⟩

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 18: Electronic Document Preparation Pocket Primer

16 CHAPTER 1 WRITING

Regex Descriptionx⟨n⟩ Matches the ucs character with code point ⟨n⟩ in hexadecimalN⟨n⟩ Matches the ucs character whose Name property Name_Alias

property or code point label tag equals ⟨n⟩p⟨p⟩ Matches any ucs character with property ⟨p⟩P⟨p⟩ Matches any ucs character without property ⟨p⟩

Property DescriptionLetter This property is satisfied by any letterPunctua-

tion

This property is satisfied by any punctuation

Symbol This property is satisfied by any symbolMark This property is satisfied by any markNumber This property is satisfied by any numberSeparator This property is satisfied by any separatorOther This property is satisfied by any ucs character that doesnrsquot belong

to any of the abovelisted categoriesBlock=⟨b⟩ This property is satisfied by characters that reside in the ucs

block ⟨b⟩ ucs blocks include Basic Latin Greek Arabic etcScript=⟨s⟩ This property is satisfied by characters that belong to the writing

system ⟨s⟩ Writing systems include Latin Korean Chinese etcNumeric

Value=⟨n⟩This property is satisfied by any ucs character with the numericvalue ⟨n⟩

Table 15 The elements of the Unicode regex syntax implementedby Perl 52 and Java 7 The list of properties is not exhaustive

The authoritativeresource on grep

sed and awk isSed amp awk [21]

which explains eachprogram as well asthe bre and ere syn-taxes in full detail

least partially implement the Unicode standard for Regular Expres-sions [20]mdashsuch as those of Perl 52 or Java 7mdashare actively awareof ucs and provide features that enable the matching of charactersbased on their general category numeric value directionality andother properties defined by Unicode as shown in Table 15

The most elementary text processing cli program is grepwhich makes it possible to search text files for fixed strings andregexes in default of an advanced text editor Unless configuredotherwise the tool will present lines that contain one or morematches to the user A more advanced text-processing cli pro-gram is sed which features a simple programming language thatcan be used to arbitrarily search and transform text files Awk isa cli program that also features a text-processing programming

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 19: Electronic Document Preparation Pocket Primer

12 VERSION CONTROL 17

The authoritativeresource on svn isVersion Control withSubversion [22] af-fectionately knownas the Subversionbook

language albeit a more advanced one than that of sed Originallydeveloped for the Research Unix during 1973ndash1977 grep sed andawk are available in various flavors for most operating systems

12 Version ControlWhen writing a text document it is often useful to have a backupof the previous versions of files so that undesirable changes canbe reverted whenever necessary If more than one person contrib-utes to the document the ability to track the authorship of thesechanges also becomes an asset At their most rudimentary VersionControl Systems (vcs) record changes along with their descriptionsand authorship information These changes can then be viewedand reverted With a single contributor vcs are a convenient alter-native to manual version archival With several contributors vcsbecome an essential tool

vcs can be dichotomized based on their architecture which iseither centralized or decentralized Centralized vcs store all versionsin a repository located on a remote server Users send new versionsto the server and retrieve existing versions using a client softwareThe client software is thin in the sense that it does not store morethan one version locally and its operation is fully dependent onthe availability of the server An example of centralized vcs isSubVersioN (svn)

By comparison there is no designated server in decentralizedvcs and the users can upload and download new versions directlyfrom one another The client software is thick in the sense that allusers have a local repository with every existing version whichthey can view and manipulate at any time The disadvantagesinclude the more complex workflow greater storage size require-ments and the increased opportunity for the users not to sharetheir local changes frequently enough leading to an increasedchance of collisions Examples of decentralized vcs include GitMercurial or Bazaar

Although vcs can be used to keep track of any kind of filesthey are especially geared towards text files which they can easilydisplay along with changes However most interactive dpses donot produce text files which can make version control challengingAs a solution some dpses include internal version control function-

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 20: Electronic Document Preparation Pocket Primer

18 CHAPTER 1 WRITINGAfter a remote

repository has beenestablished users

download the latestversion of the

document and thenkeep downloading

the latest changes byother users and

uploading changesof their own

svnadmin create

svncheckout

svnupdate

svncommit

Figure 18 The basic svn workflow

An example wouldbe the graphical

svn client Tortoisesvn that is able to

display the changesbetween two ver-sions of MicrosoftWord documentsusing the inter-

face provided byMicrosoft Office

ality that can record changes directly into output files Other dpsesprovide an interface for external vcs to display changes betweentwo versions of output documents produced by the dpses A cate-gory of its own form web services that enable real-time interactivecollaborationmdashsuch as Word Online or Google Documents

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 21: Electronic Document Preparation Pocket Primer

12 VERSION CONTROL 19After a remoterepository has beenestablished usersmake local copies ofthe entire repositoryand then storechanges in theirlocal repositories orrevert changes fromtheir localrepositories Usersperiodicallydownload the latestchanges by otherusers and uploadchanges of theirown

git init

gitclone

gitpull

gitpush

git reset git commit

Figure 19 The diagram above depicts the basic Git workflowThe diagram below depicts the use of the Git program with ansvn repository this bears all the advantages and disadvantagesassociated with decentralized vcs

svnadmin create

gitsvnclone

gitsvnrebase

gitsvn

dcommit

git reset git commit

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 22: Electronic Document Preparation Pocket Primer

20 CHAPTER 1 WRITING

Figure 110 The built-in vcs of Microsoft Word (top) and ApacheOpenOffice (bottom)

Figure 111 Tortoise svn is a graphical frontend for svn withthe ability to display the difference between two versions of aMicrosoft Word document even though it is not a text file

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 23: Electronic Document Preparation Pocket Primer

Chapter 2

Markup

Amanuscript can be a seamless current of words and still makeperfect sense to an author To truly capture its meaning in a clearand unambiguous manner however the author will often needto supplement the manuscript with a set of annotations At amore fundamental level this refers to the compliance with theorthographic rulesmdashsuch as the correct spelling capitalizationword breaks and punctuationmdashthat are specific to the languageof the document It is not at all unreasonable to expect that thisbasic compliance should be already met by the manuscript At ahigher level this consists of discovering and marking up the innerorder and logic of the text so that the resulting document can laterbe typeset in a way that visually reflects its structure

It is not unusual for an author to write and mark up of theirmanuscript at the same time Nevertheless each of the two activi-ties represents a distinct conceptWriting is the process of breakingideas down into raw sequences of words To mark up these wordsthen is to take and reassemble them back into meaningful units oflinguistic thought

Markup can be created using a variety of markup languagesAside from logical markup which captures the logical structureof a document markup languages may also provide presentationmarkup which directly impacts the visual properties of the docu-ment but carries no semantic information The usage of presenta-tion markup makes it impossible to separate the markup from thedesign and to capture the structure of the document As a result

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 24: Electronic Document Preparation Pocket Primer

22 CHAPTER 2 MARKUP

More informationabout the project

can be found withinthe Roots of sgmlndash A Personal Rec-ollection [23] andsgml The ReasonWhy and the First

Published Hint [24]

The authoritativeresource on sgmlis the sgml Hand-book [27] whichincludes the fulltext of the stan-

dard bearing exten-sive annotations

the consistency in the design of each logical part of the documentneeds to be ensured manually and future changes of design be-come error-prone and tedious In this regard logical markup isto design what style guides are to writing a means of ensuringinternal consistency that should be used whenever possible

21 Meta Markup Languages

211 The General Markup LanguageThe situation engulfing digital typesetting was growing increas-ingly frustrating for publishers in the 1960s Themarkup languagesused by different typesetting systems varied wildly and once apublisher had a large collection of documents typeset via a givencompany switching to another one could be a costly venture Thispower imbalance artificially increased the price of digital typeset-ting leading to a demand for a universal markup language

This demandwas met by a project developed at the CambridgeScientific Center of the International Business Machines Corporation(ibm) in the early 1970s The project aimed at imbuing a text editorwith the ability to query edit and display documents from acentral repository to allow the usage of computers in legal practiceVery early on in the development it became apparent that themain problemwere going to be themarkup languages inwhich thedocuments were written These languages varied wildly andmanyof them comprised largely presentation markup which madeinformation retrieval impossible without heavy use of heuristicsTo resolve these issues a unifying markup language called theGeneral Markup Language (gml) was drafted The language wasreleased [25] to the public in 1981 and finally standardized in 1986as the Standard General Markup Language (sgml) [26]

sgml documents consist of text mixed with tags which delimitmeaningful sections of the document called elements Elementsmaycarry additional information in attributes Additionally sgml doc-uments may contain miscellaneous instructions for the programsthat are processing them as well as human-readable commentsAn umbrella term for the various parts of sgml document is nodesRepeated strings of text can be declared as entities that can be usedthroughout the document in place of the original strings

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 25: Electronic Document Preparation Pocket Primer

21 META MARKUP LANGUAGES 23

A list of tools forthe manipula-tion of files in xmlschema languages ismaintained on theWeb site of w3c athttpwwww3org

XMLSchema

Although the described structure is shared by all sgml docu-ments the actual syntax as well as the restrictions regarding thecontents and the attributes of individual elements are declaredwithin a Document Type Declaration (dtd) which can be differentfor each document It is worth noting that a dtd only declaresthe syntax of an sgml document the semantics of the individualelements and their attributes are left to the interpretation of theprogram processing the document The syntax and the constraintsimposed by a dtd define an application of sgml An sgml documentis considered to be a valid instance of an sgml application whenit conforms to the corresponding dtd

212 The Extensible Markup LanguageAlthough sgml was designed to be the general format for dataexchange the complexity of the specification and the lack of sup-port for Unicode (see Section 111) proved to be a major hindrancepreventing its wider adoption and the development of sgml toolsIn a response the World Wide Web Consortium (w3c) published aspecification of the eXtensible Markup Language (xml) [28] in 1998Along with the introduction of xml the sgml specification re-ceived a technical corrigendum [29] which turned xml into ansgml application defined through a dtd

This dtd completely fixes the syntax of xml documents whichmakes it possible to differentiate between two levels of correct-ness An xml document is considered to be well-formed when itconforms to the dtd that specifies the syntax of xml and to thexml specification An xml document is considered to be validagainst an dtd when it is well-formed and conforms to the saiddtd Along with dtds there exists a wealth of schema languages forxmlmdashsuch as w3c xml Schema relax ng or Schematronmdashthatcan be used to check the validity of an xml document instead of adtd The constrains imposed by either a dtd or a schema definean application of xml (also language or format)

Alongwith schema languages other supplementary languagesexist such as XPointer XPath and XQuery for the retrieval of datafrom XML documents the Cascading Style Sheets language (css) [30]for the specification of xml document design and the variouslanguages for the description ofWeb resources that wewill discussin Section 223

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 26: Electronic Document Preparation Pocket Primer

24 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltrecipegt

ltnamegtPalatschinkenltnamegt

ltdescriptiongtA Slavic crecircpe-like dishltdescriptiongt

ltingredientList serves=8gt

ltingredient amount=120ggtPlain flourltingredientgt

ltingredient amount=2gtEggltingredientgt

ltingredient amount=300mlgtMilkltingredientgt

ltingredient amount=1 tblspngtOilltingredientgt

ltingredient amount=1 pinchgtSaltltingredientgt

ltingredientListgt

ltstepListgt

ltstepgtCombine the ingredients and whisk until

you have a smooth batterltstepgt

ltstepgtHeat oil on a pan pour in a tablespoonful

of the batter fry until golden brownltstepgt

ltstepgtRepeat until there is no batter leftltstepgt

ltstepgtServe rolled and filled with jamltstepgt

ltstepListgt

ltrecipegt

Figure 21 An example xml document (recipexml)

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 27: Electronic Document Preparation Pocket Primer

21 META MARKUP LANGUAGES 25dtds in sgml andxml documents canbe either linked tothe documentthrough PUBLIC andSYSTEM identifiers(top) directlyembedded in thedocument (middle)linked to thedocument and thenextended by anembeddedspecification(bottom) oromitted

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtdgt

ltDOCTYPE recipe SYSTEM recipedtdgt

ltDOCTYPE recipe [

ltELEMENT recipe (name description ingredientList

stepList)gt

ltELEMENT name (PCDATA)gt

ltELEMENT description (PCDATA)gt

ltELEMENT ingredientList (ingredient+)gt

ltATTLIST ingredientList serves CDATA REQUIREDgt

ltELEMENT ingredient (PCDATA) gt

ltATTLIST ingredient amount CDATA REQUIREDgt

ltELEMENT stepList (step+) gt

ltELEMENT step (PCDATA)gt ]gt

ltDOCTYPE recipe PUBLIC -EXAMPLEDTD FOR RECIPES

httpwwwexamplecomDTDrecipedtd [

lt-- Omitted for brevity --gt ]gt

ltDOCTYPE recipe SYSTEM recipedtd [

lt-- Omitted for brevity --gt ]gt

Figure 22 An example dtd

element recipe

element name text

element description text

element ingredientList

attribute serves xsdpositiveInteger

element ingredient

attribute amount text text

+

element stepList

element step text +

Figure 23 A reformulation of the dtd from Figure 22 in thecompact syntax of the relax ng schema language (recipernc)Note how relax ng allows us to constrain the attribute data types

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 28: Electronic Document Preparation Pocket Primer

26 CHAPTER 2 MARKUP

ltxml version=10 encoding=UTF-8gt

ltschema xmlns=httpwwww3org2001XMLSchemagt

ltelement name=recipegtltcomplexTypegtltallgt

ltelement name=name type=string minOccurs=1gt

ltelement name=description type=string

minOccurs=1gt

ltelement

name=ingredientListgtltcomplexTypegtltsequencegt

ltelement name=ingredient minOccurs=1

maxOccurs=unboundedgt

ltcomplexTypegtltsimpleContentgt

ltextension base=stringgt

ltattribute name=amount type=stringgt

ltextensiongt

ltsimpleContentgtltcomplexTypegt

ltelementgtltsequencegt

ltattribute name=serves type=positiveInteger

use=requiredgt

ltcomplexTypegtltelementgt

ltelement name=stepListgtltcomplexTypegtltsequencegt

ltelement name=step type=string minOccurs=1

maxOccurs=unboundedgt

ltsequencegtltcomplexTypegtltelementgt

ltallgtltcomplexTypegtltelementgt

ltschemagt

Figure 24 A reformulation of the dtd from Figure 22 in the xmlSchema language (recipexsd)

xmllint -noout --dtdvalid recipedtd recipexml

xmllint -noout --schema recipexsd recipexml

trang recipernc reciperng Compact -gt Full Relax NG

xmllint -noout --relaxng reciperng recipexml

Figure 25 xml documents can be easily validated against xmlschemata using the free command-line program of xmllint

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 29: Electronic Document Preparation Pocket Primer

21 META MARKUP LANGUAGES 27

A notable feature of xml unavailable in sgml are namespaceswhich were added to the xml specification [32] in 1999 Name-spaces enable the inclusion of elements and attributes from differ-ent xml applications within a single xml document each applica-tion is uniquely identified through an the Internationalized ResourceIdentifiers (ir is) [33] Namespaces in xml are a spiritual successorof a more expressive sgml feature of CONCUR which makes it pos-sible to mark up several structural views of a single documentUnlike with CONCUR which ties each view to an sgml dtd thereexists no general mechanism for the translation of the ir is to xml

Speech

AASE See you dare not Every word of itrsquos a liePEER Swear Why should IAASE Well then swear to me itrsquos truePEER No Irsquom notAASE Peer yoursquore lying

VerseEvery word of itrsquos a lieSwear Why should I See you dare notWell then swear to me itrsquos truePeer yoursquore lying No Irsquom not

lt(V)linegt

lt(S)speech who=AasegtPeer youre lyinglt(S)speechgt

lt(S)speech who=PeergtNo Im notlt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=AasegtWell then

swear to me its truelt(S)speechgt

lt(V)linegtlt(V)linegt

lt(S)speech who=PeergtSwear why should Ilt(S)speechgt

lt(S)speech who=AasegtSee you dare not

lt(V)linegtlt(V)linegt

Every word of its a lielt(S)speechgt

lt(V)linegt

Figure 26 The markup of the dramatic and metrical views ofHenrik Ibsenrsquos Peer Gynt using the CONCUR feature of sgml Thisfigure was inspired by the figures found in the article goddag AData Structure for Overlapping Hierarchies [31]

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 30: Electronic Document Preparation Pocket Primer

28 CHAPTER 2 MARKUP

The authoritativeresource on the Doc-Book xml formatis DocBook 5 The

Definitive Guide [34]The book itself iswritten in Doc-

Book and its sourcecode is publiclyavailable at http

docbookorg

The Postelrsquos lawstates that one

should be conser-vative in what they

send but liberalin what they ac-

cept [37 sec 210]It is one of the baseprinciples for build-ing robust commu-nication protocols

schemata This makes it impossible to validate namespaced xmldocuments unless all the ir is and their schemata are known tothe parser

Due to the reduced complexity of xml compared to sgml thelanguage was adopted by the industry and has superseded sgmlin most applications Some of the applications of xml for docu-ment preparation include DocBookmdasha technical documentationmarkup language used for authoring books by publishers suchas OrsquoReilly Media and for documenting software at companiessuch as Red Hat suse or Sun Microsystemsmdash the Text EncodingInitiative (tei)mdasha general text encoding markup language for theuse in the academic field of digital humanitiesmdash the MathematicalMarkup Language (mathml)mdasha markup language for the descrip-tion of mathematical formulaemdash or the Scalable Vector Graphicslanguage (svg)mdasha vector graphics format Other xml applicationssuch as xhtml and rdfxml will be discussed in Section 22

22 Markup on the World Wide Web

221 The Hypertext Markup LanguageIn 1989 an English computer scientist named Timothy JohnBerners-Lee proposed a decentralized system for sharing doc-uments within the European Organization for Nuclear Research (laConseil Europeacuteen pour la Recherche Nucleacuteaire cern) [35] The systemlaid foundation for the Web and earned its author knighthoodThe markup language used to write documents for the systemwas an application of sgml called the HyperText Markup Language(html) In 1993 the Web started to gain traction among the gen-eral public owing largely to the release of the first graphical Webbrowser Mosaic which paved way for the Web browsers of todayIn 1994 Timothy John Berners-Lee formed w3c which has sincedeveloped the standards for the Web

The first standard version of html was html 20 [36] pub-lished in 1995 As the Web was becoming ubiquitous it beganaccumulating an increasing number of documents that werenrsquotvalid instances of html since most Web browsers faced with amalformed document would act in accordance with the Postelrsquoslaw and try to render the document despite its deficiencies In

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 31: Electronic Document Preparation Pocket Primer

22 MARKUP ON THE WORLD WIDE WEB 29

JScript and VBScriptcompeted directlywith JavaScriptbut they never sawimplementationoutside Microsoftbrowsers

an attempt to unify the way malformed html documents wererendered across the Web browsers w3c acknowledged and doc-umented this behavior as a part of the html5 specification [38sec 82] An example of a non-conforming html5 document andits canonical interpretation is given in Figure 27

Initially html only comprised a mixture of logical and presen-tation markup with fixed visual interpretation This changed withthe specification of css which was introduced byw3c in 1996 Thelanguage enabled the specification of the visual properties for anyhtml element which enabled the separation of document markupand design effectively eliminating the need for the presentationmarkup

During the same period an initial version of a scripting lan-guage called JavaScript [39] was drafted and incorporated intoNetscape Navigator 20mdashone of the contemporary leading webbrowsers and a descendant of the original Mosaic browser As apart of a joint effort by Sun Microsystems and Netscape Com-munications to bring the programming language of Java intoweb browsers JavaScript was supposed to complement Java ap-plets [40]mdasha role it has since outgrown Standardized in 1997 [39]JavaScript blurred the line between static documents and inter-active applications and remains the predominant client-side pro-gramming language of the Web However since the support ofJavaScript by a Web browser is fully optional it is considered agood practice not to depend on JavaScript for the rendering ofhtml documents In the case of interactive html applications thisrecommendation may be relaxed

222 The Extensible Hypertext Markup LanguageEver since the release of xml in 1998 w3c entertained the idea ofturning html into an application of xml rather than of sgml as

ltbgtBold ltigtbold and italicltbgt italicltigt

ltbgtBold ltbgtltigtltbgtbold and italicltbgt italicltigt

Figure 27 The first line contains overlapping elements and assuch canrsquot be a part of a valid html document Neverthelessbrowsers should handle it identically to the second line

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 32: Electronic Document Preparation Pocket Primer

30 CHAPTER 2 MARKUP

ltfont face=Verdana size=4gt

ltfont size=+2gtltbgtSO WHAT IS THIS ABOUTltbgtltfontgt

ltbrgtltbrgtThere is a continuing need to show the power of

ltigtCSSltigt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The ltigtHTML

ltigt remains the same the only thing that has changed

is the external ltigtCSSltigt file Yes really

ltfontgt

Figure 28 An excerpt from the Web site of the css Zen Zardenlocated at httpcsszengardencom The document above wascreated using the html presentation markup The document be-low achieves the same appearance by the combination of logicalmarkup and css

ltstylegt

body

font large Verdana

font-size large

h1

font-size x-large

text-transform uppercase

abbr

font-style italic

ltstylegt

lth1gtSo what is this aboutlth1gt

ltpgtThere is a continuing need to show the power of

ltabbrgtCSSltabbrgt The Zen Garden aims to excite inspire

and encourage participation To begin view some of the

existing designs in the list Clicking on any one will

load the style sheet into this very page The

ltabbrgtHTMLltabbrgt remains the same the only thing that

has changed is the external ltabbrgtCSSltabbrgt file Yes

reallyltpgt

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 33: Electronic Document Preparation Pocket Primer

22 MARKUP ON THE WORLD WIDE WEB 31

The idea of a net-work of machine-readable data wasdescribed by TimBerners-Lee in 2006in the article LinkedData [43]

exemplified by the working draft of Reformulating html in xml [41]Unlike html parsers whose acceptance of malformed contentmakes them complex xml parsers are required to strictly refusexml documents that arenrsquot well-formed [28 Section 12 Termi-nology] leading to architectural simplicity and decreased com-putational requirements As a result reformulating html in xmlwas suggested as a way to bring the Web to mobile embeddedand other devices limited in their computational resources andto reduce the amount of malformed documents on the Web ingeneral Other perceived advantages included the ability to usexml tools for web documents and to include instances of otherxml applicationsmdashsuch as mathml and svgmdashdirectly into webdocuments through xml namespaces

The idea was brought to fruition in the xml application of theeXtensible HyperText Markup Language (xhtml) [42] However thesupposed benefits proved to be too marginal to warrant migrationfrom html The speed advantages of the simplified processingwere largely offset by the lack of support for incremental renderingsince it is impossible to validate and render partially downloadedxhtml documents and the advances in the area of mobile devicesmadehtmlprocessing sufficiently fast The lack ofways to providealternative content for browsers that would not support the xmlapplications instantiated in the xhtml documents also reducedthe usefulness of the xml namespaces in xhtml considerably Asa result xhtml has yet to succeed in replacing html and remainsa minority markup language on the Web

223 The Semantic Web and Linked DataTheWeb is based on the idea of a distributed and globally availablenetwork of human knowledge The languages ofhtml xhtml cssand JavaScript form the foundation of the human-readable partsof the Web but are inadequate for creating a network of machine-readable data that could be navigated by software agents Drawingfrom the research in the field of knowledge representation w3ccreated the Resource Description Framework (rdf) [44] in 1999mdashalanguage for the description of resources on the Web

An rdf document represents data as a set of triplets Eachtriplet comprises a predicate a subject and an object where boththe predicate and the subject are specified as resources using ir is

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 34: Electronic Document Preparation Pocket Primer

32 CHAPTER 2 MARKUP

A list of ontologiesthat are fully doc-umented honorthe current bestpractices and

are supported byvarious tools canbe found on the

w3c wiki at httpwwww3orgwiki

Good_Ontologies

If the object of a triplet (119901 119904 119900) is also a resource the triplet can beinterpreted as a subject 119904 being in a relation 119901 with the object 119900 Ifthe object is a literal value rather than a resource the triplet can beinterpreted as a subject 119904 having a property 119901 with the value 119900

Resources in rdf are specified via ir is to prevent naming colli-sions in rdf documents created independently by distinct authorsThese ir is do not need to point to any existing web page andmdashbeside the small set of standard resources specified within therdf specificationmdashthey carry no inherent meaning In order to de-scribe a set of resources the relationships between them and theirintended meaning in an rdf document an extension of the set ofstandard resources called rdf Schema [45] can be used The result-ing documents are called ontologies and can be used for automatedreasoning about rdf documents containing resources described bythe ontology Some of thewell-known ontologies include the DublinCore (dc)mdashan ontology for the generic description of resourcesboth digital and physicalmdash Friend Or A Foe (foaf)mdashan ontologyfor the description of people and their social relationshipsmdash orthe Music Ontologymdashan ontology for the description of entitiesrelated to the music industry such as albums artists tracks andevents More expressive standards for the creation of ontologiessuch as the Web Ontology Language (owl) [46] also exist

rdf documents can be represented through many languagesincluding xml [44] json for ld (json-ld) [47] Turtle [48] andN-Triples [49] Although rdfdocuments in any of these representa-tions can be included in or linked to html and xhtml documentsthis will often result in the undesirable duplication of data Toprevent this the language of rdf in attributes (rdfa) [50] makesit possible to mark parts of the html or xhtml document as rdfdata The usage of rdf in conjunction with html and xhtml is in-tended to gradually obsolete the loosely-defined use of html andxhtml attributes the ltmetagt and ltlinkgt elements and the cssclass names to include additional machine-readable metadata intothe documents on theWebmdasha technique known asmicroformatting

23 Document Preparation SystemsSome of the existing markup languages are tied directly to spe-cific Document Preparation Systems (dpses) These dpses can be

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 35: Electronic Document Preparation Pocket Primer

23 DOCUMENT PREPARATION SYSTEMS 33

ltxml version=10 encoding=UTF-8gt

ltrdfRDF xmlnsrdf=httpwwww3org19990222-

rdf-syntax-ns

xmlnsdc=httppurlorgdcterms

xmlnsfoaf=httpxmlnscomfoaf01gt

ltrdfDescription

rdfabout=httpexampleorgdocumenthtmlgt

ltdctitle xmllang=engtJohns Web pageltdctitlegt

ltdccreator

rdfresource=httpexampleorgjohn-smithgt

ltrdfDescriptiongt

ltrdfDescription

rdfabout=httpexampleorgjohn-smithgt

ltrdftype rdfresource=foafPersongt

ltfoafnamegtJohn Smithltfoafnamegt

ltrdfDescriptiongt

ltrdfRDFgt

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermstitlegt Johns Web pageen

lthttpexampleorgdocumenthtmlgt

lthttppurlorgdctermscreatorgt

lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

lthttpwwww3org19990222-rdf-syntax-nstypegt

lthttpxmlnscomfoaf01Persongt

lthttpexampleorgjohn-smithgt

lthttpxmlnscomfoaf01namegt John Smith

prefix foaf lthttpxmlnscomfoaf01gt

prefix dc lthttppurlorgdcelements11gt

lthttpexampleorgdocumenthtmlgt

dctitle Johns Web pageen

dccreator lthttpexampleorgjohn-smithgt

lthttpexampleorgjohn-smithgt

a foafPerson

foafname John Smith

Figure 29 An example rdf document using the dc and foafontologies in the languages of rdfxml (johnrd top) N-Triples(johnnt middle) and Turtle (johnttl bottom)

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 36: Electronic Document Preparation Pocket Primer

34 CHAPTER 2 MARKUP

ltDOCTYPE htmlgt

lthtml lang=engt

ltheadgt

ltlink rel=meta type=applicationrdf+xml

href=johnrdfgt

ltlink rel=meta type=textturtle href=johnttlgt

ltlink rel=meta type=applicationn-triples

href=johnntgt

lttitlegtJohns Web pagelttitlegt

ltheadgt

ltbodygt

Hi Im John Smith

ltbodygt

lthtmlgt

Figure 210 Above is an html document linked to the rdf doc-ument from Figure 29 Below is the same html document withthe rdf data directly embedded using the rdfa language

ltDOCTYPE htmlgt

lthtml lang=engt

lthead vocab=httppurlorgdcterms

about=httpexampleorgdocumenthtmlgt

lttitle property=title lang=engtJohns Web

pagelttitlegt

ltmeta property=creator

href=httpexampleorgjohn-smithgt

ltheadgt

ltbody vocab=httpxmlnscomfoaf01

about=httpexampleorgjohn-smith

typeof=Persongt

Hi Im ltspan property=namegtJohn Smithltspangt

ltbodygt

lthtmlgt

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 37: Electronic Document Preparation Pocket Primer

23 DOCUMENT PREPARATION SYSTEMS 35

httpexampleorgdocumenthtml

Johns Web pageen

dctitle

httpexampleorgjohn-smith

foafPersonrdftype

John Smith

foafname

foafcreator

Figure 211 A graph of the rdf document in Figure 29

categorized into the batch-oriented which process text files intoprintable output documents on demand and the interactive (alsoWhat You See Is What You Get (wysiwyg)) which allow the user todirectly edit an approximation of the output document througha visual editor The price for the mild learning curve of interac-tive dpses are the more primitive typesetting algorithms whichneed to be sufficiently fast to enable real-time user interactionand the reduced flexibility stemming from the usage of a Graphi-cal User Interface (gui) which although often intuitive for simpletasks seldom matches the power of the markup languages usedby batch-oriented dpses

231 Batch-oriented SystemsOne of the archetypal batch-oriented dpses are troff whose func-tion is to produce output for general printers and nroff whosefunction is to produce output for line printers and text terminalsBoth are proprietary software developed for the Unix operatingsystem at the beginning of 1970s by the American Telephone andTelegraph corporation (atampt) An alternative to nroff and troff isgroff which was developed as free software for the gnu is NotUnix (gnu) project in 1980 by the members of the the Free SoftwareMovement (fsm) Groff combines the capabilities of both systemsand is used extensively for the markup of documentation in Unixand Unix-like operating systems The markup language of groffcombines presentation markup with programming constructs andenables the definition of logical markup through user macros The

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 38: Electronic Document Preparation Pocket Primer

36 CHAPTER 2 MARKUP

The circumstancesthat led to the cre-

ation of TEX and thesurrounding tools

are thoroughly doc-umented in Digital

Typography [52]

standard macro packages for groff include man for the formattingof documentation me for the creation of research papers and themore recent mom for general typesetting tasks Special markup in-vokes preprocessors that can be used for the typesetting of tablesequations and vector graphics

Another notable free batch-oriented dps is TEX which wasdeveloped in the 1970s by an American professor of computerscience Donald Knuth after he had received galley proofs for thesecond volume of his monograph the Art of Computer Programmingand found the appearance of mathematical formulae distastefulAs a result the typesetting of mathematics is a central theme inTEX rather than an afterthought which differentiates it from mostother dpses and which contributes to the massive popularity TEXhas enjoyed among academics Much like in the case of troff andits derivatives the language of TEX contains only typographic andprogramming primitives but the creation of logical markup ispossible through user macros A popular TEX macro package thatenables the creation of various types of documentswith just logicalmarkup is LATEX the standard markup language for academic andtechnical documents

232 Interactive SystemsInteractive dpses come in two distinct flavors Word processors arethe digital progeny of the typewriter machine whose output docu-ments served as manuscripts to be typeset by a typographer Withthe advent of personal computing and the Web self-publishingbecame more affordable to the general public and modern wordprocessors can be used not only to write but also to design andtypeset documents although the offered functionally is typicallylimited to ensure ease of use This concern is not shared by Desk-Top Publishing (dtp) software which provides refined control overthe resulting page layout and the typesetting at the expense of asteeper learning curve

Most interactive dpses will provide a means to mark up sec-tions of text Presentation markup enables direct changes to thedesign whereas logical markup enables the classification of sec-tions of text with the ability to set up the design of each class lateron This decouples writing and markup from design and makes iteasy to consistently change the design of an entire document

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 39: Electronic Document Preparation Pocket Primer

23 DOCUMENT PREPARATION SYSTEMS 37

The Cask of Amontilladoby

Edgar Allen Poe

T he thousand injuries of Fortunato I had borne as I bestcould but when he ventured upon insult I vowedrevenge You who so well know the nature of my soul

will not suppose however that gave utterance to a threat Atlength I would be avenged this was a point definitely settledmdashbut the very definitiveness with which it was resolved precludedthe idea of risk I must not only punish but punish withimpunity A wrong is unredressed when retribution overtakes itsredresser

-1-

TITLE The Cask of Amontillado

AUTHOR Edgar Allen Poe

PRINTSTYLE TYPESET

PAGE 6i 9i 75i 75i 75i 75i

START

PP

DROPCAP T 3

he thousand injuries of Fortunato I had borne as I best

could but when he ventured upon insult I vowed revenge

You who so well know the nature of my soul will not

suppose however that gave utterance to a threat

[IT]At length[PREV] I would be avenged this was a

point definitely settled[em]but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresser

Figure 212 An excerpt from the beginning of Edgar Allen PoersquosCask of Amontillado as a text marked up using the mom macropackage of groff (below) and the output document (above) Themarked up text was borrowed from the web page of mom [51]

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 40: Electronic Document Preparation Pocket Primer

38 CHAPTER 2 MARKUP

Page geometry

pdfpagewidth=6in pdfpageheight=9in

Page dimensions

hsize=dimexprpdfpagewidth-15in

vsize=dimexprpdfpageheight-15in

baselineskip=168pt

hoffset=-25in voffset=-25in

Fonts

fontrm=ptmr8t at 125ptrm fontbigbf=ptmb8t at 16pt

fontdropcap=ptmr8t at 62pt fontit=ptmri8r at 125pt

Logical markup definition

deftitle1bigbfcenterline1

defauthor1itcenterlinebycenterline1

vskip 39em

defchapter1noindentsmashhskip01exlower58ex

hboxllapdropcap1hskip-03ex

parshape=4 3emdimexprhsize-3em 328em

dimexprhsize-328em 328em

dimexprhsize-328em 0emhsize

The document

titleThe Cask of Amontillado

authorEdgar Allen Poe

chapter The thousand injuries of Fortunato I had borne

as I best could but when he ventured upon insult I vowed

revenge You who so well know the nature of my soul

will not suppose however that gave utterance to a

threat it At length I would be avenged this was a

point definitely settled---but the very definitiveness

with which it was resolved precluded the idea of risk I

must not only punish but punish with impunity A wrong is

unredressed when retribution overtakes its redresserbye

Figure 213 The document from Figure 212 reformulated in TEXusing plain TEX macros and the primitives of 120576-TEX and pdfTEX

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 41: Electronic Document Preparation Pocket Primer

24 LIGHTWEIGHT MARKUP LANGUAGES 39

Figure 214 Logical markup in the interactive dpses of Scribus(left) Microsoft Word (top) Adobe InDesign (bottom left) andApache OpenOffice (bottom right)

24 Lightweight Markup LanguagesParallel to the heavy-duty applications of sgml and xml thereruns a vein of markup languages that give priority to unobtru-siveness and legibility over raw expressive power Rooted in thereality of computer text terminals with limited formatting capa-bilities lightweight markup languages leverage punctuation and in-dentation to produce comparatively weak and domain-specificbut also humane highly intuitive and often profoundly beautifulmarkup that is easy to both read and write Examples of light-weight markup languages include Markdown Creole AsciiDocMakeDoc Setext and Wikicode Lightweight markup languagesare typically supplemented by tools that enable the conversion tomore general markup languages such as html The more pop-ular lightweight markup languages come in various flavors thatrepresent their use cases

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 42: Electronic Document Preparation Pocket Primer

Chapter 3

Design

After a manuscript has been written and marked up it is time tocreate a visual system that will emphasize the internal structureand the character of the document In print design this involvesthe selection of one or several typefaces that are well-suited toboth the document and each other the design and the positioningof the structural elements of the documentmdashsuch as headingstables figures and lists and the choice of the paper size and thepage layout In web design and multi-target publishing severalvisual systems may have to be created to accommodate for variousdisplay devices

31 FontsWhen choosing typefaces for a document legibility should be offoremost concern The body text should be set with a typeface at asize of at least 10 pt if the document is aimed at adult readers or12 pt if visually impaired readers and elementary-school studentsare a part of the audience [53 para 13ndash15] The target mediumalso needs to be taken into consideration A faithful copy of a type-face designed for the letterpress will look lighter than originallyintended when printed digitally This may hamper its legibility ifit contains hairline strokes [54 sec 612] In printed documentstypefaces with serifs are more familiar to the reader and thereforemore suitable for long-distance reading than their sans-serif coun-

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 43: Electronic Document Preparation Pocket Primer

42 CHAPTER 3 DESIGN

terparts At low-resolution screens however simple low-contrasttypefaces with slab or no serifs will often yield the best result

A typeface should also contain all the letters and symbols thatwill appear in the document If the manuscript is multilingual andcontains passages in both Latin and non-Latin writing systems itmay be necessary to combine several typefaces If the multilingualmanuscript only contains Latin characters but several accentedcharacters are missing from the body text typeface they may beconstructed by combining the body text typeface with diacriti-cal marks from another font family If certain punctuation marksand other symbols are missing from the body text typeface theymay likewise be borrowed from other font families The typefacesshould be consonant in their spirit and structure unless the textwould benefit from the dissonance [54 sec 512]

Beside the body text typeface several other typefaces may ap-pear in a documentmdasha bold face an italic face or perhaps severalsizes of the body text typeface for use in the structural elementsThe natural instinct is to pick these typefaces from a single fontfamily but some families may not offer all typefaces that the de-sign requires In those case the typefaces may again have to beborrowed from other font families

32 Structural Elements

321 Paragraphs and StanzasAs the base units of linguistic thought in prose paragraphs splitthe text into coherent portions ready for consumption A line in aparagraph of the body text should be 45ndash75 characters long on asingle-column page or 40ndash50 characters long on a multi-columnpage and justified (spread horizontally to fit the column width)Extended passages of lines wider than 80 characters strain theeye of the reader whereas justified lines that are too narrow toaccommodate 40 characters may make the word spacing entirelytoo loose In the latter case the text should be set ragged insteadas seen in the sidenotes throughout this book [54 sec 212]

Vertically the lines of a paragraph should be separated byapproximately twenty to forty-five percent of the typeface size [55]If the size of the body text typeface is 10 pt then the body text

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 44: Electronic Document Preparation Pocket Primer

32 STRUCTURAL ELEMENTS 43

ThesecondfunctionofSoulndashknowingndashwasnotatfirstdistinguishedfrommotionAristotle saysφαμὲν γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαιἔτι δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν κινεῖσθαι ldquoThe soul issaid to feel pain and joy confidence and fear and again to be angry to perceive and tothink and all these states are held to bemovements whichmight lead one to supposethat soul itself ismovedrdquo

1

documentclass[11pt]article

usepackagefontspec leading newunicodechar

usepackage[Latin Greek]ucharclasses

setTransitionsForLatin

fontspecAlegreyaSans-Regularttf[Ligatures=TeX]

setTransitionsForGreek

fontspecGFSNeohellenicotf[Scale=12 WordSpace=05

Ligatures=TeX]

newunicodecharraisebox8ex

frenchspacing

leading14pt

begindocument

The second function of Soul -- knowing -- was not at

first distinguished from motion Aristotle says φαμὲν

γὰρ τὴν ψυχὴν λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι

δὲ ὸργίζεσθαί τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα

δὲ πάντα κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν

αὐτὴν κινεῖσθαι

``The soul is said to feel pain and joy confidence and

fear and again to be angry to perceive and to think

and all these states are held to be movements which

might lead one to suppose that soul itself is moved

enddocument

Figure 31 An excerpt from F M Cornfordrsquos From Religion to Philos-ophy A Study in the Origins of Western Speculation as a text markedup in TEX using LATEX macros and the primitives of XƎTEX (below)and the output document (above) Note that two typefaces wereused the regular typeface of Alegreya Sans at the size of 11 pt forthe Latin characters and the regular typeface of GFS Neohellenicat the size of 132 pt for the Greek characters

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 45: Electronic Document Preparation Pocket Primer

44 CHAPTER 3 DESIGN

ltstylegt

font-face

font-family Alegreya Sans

src url(AlegreyaSans-Regularttf)

format(truetype)

unicode-range U+00-24F U+1E00-1EFF U+2000-206F

U+2C60-2C7F U+A720-A7FF U+FB00-FB4F

font-face

font-family GFS Neohellenic

src url(GFSNeohellenicotf) format(opentype)

unicode-range U+2C80-2CFF U+370-3FF U+1F00-1FFF

U+102E0-102FF

p

font-family Alegreya Sans GFS Neohellenic

sans-serif

line-height 14pt

[lang=en]

font-size 11pt

[lang=gr]

font-size 132pt

ltstylegt

ltpgtltspan lang=engtThe second function of Soul ndash knowing

ndash was not at first distinguished from motion Aristotle

says ltspangtltspan lang=grgtφαμὲν γὰρ τὴν ψυχὴν

λυπεῖσθαι χαίρειν θαρρεῖν φοβεῖσθαι ἔτι δὲ ὸργίζεσθαί

τε καὶ αἰσθάνεσθαι καὶ διανοεῖσθαι ταῦτα δὲ πάντα

κινήσεις εἶναι δοκοῦσιν ὅθεν οἰηθείη τις ἂν αὐτὴν

κινεῖσθαι ltspangtltspan lang=engtldquoThe soul is said to

feel pain and joy confidence and fear and again to be

angry to perceive and to think and all these states

are held to be movements which might lead one to suppose

that soul itself is movedrdquoltspangtltpgt

Figure 32 The document from Figure 31 reformulated in html5and css3

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 46: Electronic Document Preparation Pocket Primer

32 STRUCTURAL ELEMENTS 45

line height (also known as the leading) would be between 12 and145 pt adding 1 to 225 pt of lead above and below each line As ageneral guideline dark and bulky typefaces require more leadingas do texts riddled with accents full capital letters subscripts andsuperscripts [54 sec 221] The body text of this book is set in10 pt Palatino with the leading of 12 pt To allow for such minimalleading all acronyms and other strings of upper-case letters areset as small capitals (capital letters whose height matches the lowercase)

Two adjacent paragraphs should be visibly separated withoutdistracting the reader from the text A predominant method is toindent the initial line of a paragraph with one half (1 en) to threetimes (3 em) the typeface size The indent is unnecessary whenthere is no ambiguitymdashsuch as in the first paragraph following aheading [54 sec 23]

If the margins are ample outdented paragraphs are an intriguingoption as well iexcl Paragraphs can also be separated by graphicalsymbols such as pilcrows bullets or boxes A plain horizon-tal space that is at least 3 em wide can likewise act as a paragraphseparator [56 ch 2 p 16]Block paragraphs exchange indentation and horizontal separatorsfor additional vertical space above and below the paragraph Injustified block paragraphs this space can be omitted as well al-though the typesetter then has to manually ensure that the lastline of each paragraph offers enough horizontal space to act asa separator In short documents and limited spans of text blockparagraphs are an attractive option [54 sec 232]

Being the verse counterpart to the paragraph the stanza is acollection of lines rather than of sentences Due to this structuraldifference stanzas are typically only justified when the individuallines are long enough to fill up the column and ragged otherwiseMuch like in the case of prose short-form poetry benefits fromhaving the stanzas set in block paragraph style

322 HeadingsAnother fundamental structural element is the heading The func-tion of a heading is to delimit and name the individual sections ofa document To alleviate navigation headings should be a promi-nent presence on a page This can be achieved by using a larger

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 47: Electronic Document Preparation Pocket Primer

46 CHAPTER 3 DESIGN

Sizes in inches Page proportionsA4 827 times 117 2 ∶ radic2 141421B5 693 times 984 1 ∶ radic2 0707Letter 8 1

2 times 11 1 ∶ 1294 12941

Table 31 An overview of commonpaper sizes used for commercialand industrial printing

This is a side-note Sidenotesenliven the pageand are easy for

the reader to find

variant of the body text typeface or by including the text of the lat-est heading in the margin or the header of the page [54 sec 421]as seen throughout this book

The hierarchy of the headings can be expressed through thevariation of typefaces indentation alignment and numberingalthough alternating the size of the body text typeface is sufficientfor many types of documents In documents that are bound incodex form and read two pages at a time the height of headingsshould be a whole multiple of the line height of the body textso that the headings do not disrupt the alignment of lines on thefacing pages [53 para 33]

323 Tables and ListsTables and lists are structural elements that should fit seamlesslyinto the surrounding text and avoid unnecessary visual clutter Usethe same typeface the surrounding text does treat the columnsof tables the same way you treat columns in the text and keepthe amount of rules boxes dots and extraneous spacing to a bareminimum (see Table 31) [54 sec 2110 and 44]

324 NotesNotes provide commentary on a specified passage of the main textand can take three different forms

1 Sidenotes are displayed in the horizontal margins next to the rele-vant passage of themain text as seen throughout this book Unlessthe horizontal margins are very wide sidenotes are unsuitablefor the inclusion of bibliographical referencesmdasha common use fornotes in academic writing

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 48: Electronic Document Preparation Pocket Primer

32 STRUCTURAL ELEMENTS 47

2 Footnotes are delegated to the bottom of the page and linked to therelevant passage of the main text through symbols or superscriptnumbers1 Compared to side notes they are more difficult for thereader to find Footnotes should align with the bottom of the textblock not stick out into the bottom margin [53 para 48]

3 Endnotes are delegated to the end of a section or the entire doc-ument and are linked to the relevant passage of the body textthrough superscript numbers They are the easiest of the three totypeset but also the hardest for the reader to find

Notes are typically typeset in sizes from 8pt up to the body texttypeface size depending on their frequency importance and aver-age length [54 sec 43] If several categories of notes are presentin the document it may be desirable to give each a different form

325 QuotationsQuotations repeat what has already been expressed somewhereelse before and can take two different forms [54 sec 54]

1 Run-in quotations are included directly into the paragraph andset off from the surrounding text using quotation marks in accor-dance with the orthographic rules on the use of punctuation inthe language of the paragraph ldquoJesters do oft prove prophetsrdquoFrom the designerrsquos viewpoint run-in quotations require no spe-cial treatment although it is crucial that the body text typefacecontains the required quotation marks

2 Block quotations are set as block paragraphs that are clearly sepa-rated from the surrounding text This involves adding a verticalspace above and below the block paragraphs and optionally alsochanging the typeface its size or the indentation of the para-graphs [54 sec 233]

This is the excellent foppery of the world that when we are sick in for-tunemdashoften the surfeit of our own behaviormdashwe make guilty of ourdisasters the sun the moon and the stars as if we were villains by ne-cessity fools by heavenly compulsion knaves thieves and treachers byspherical predominance drunkards liars and adulterers by an enforced

1 This is a footnote Due to their width footnotes can comfortably accommodate fullbibliographical references which makes them popular in academic writing

A footnote can also contain multiple paragraphs of text although long foot-notes are tedious to read if the size of the typeface is small [54 sec 431]

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 49: Electronic Document Preparation Pocket Primer

48 CHAPTER 3 DESIGN

obedience of planetary influence and all that we are evil in by a divinethrusting-on An admirable evasion of whoremaster man to lay his goat-ish disposition to the charge of a star

mdashWilliam Shakespeare King Lear

Block quotations are ideal for longer quotations and for quotationsthat should carry more weight that run-in quotations

33 Page LayoutThe page consists of a textblock surrounded by margins The textwidth area is largely determined by the number of columns andthe body text sizemdashas described in Section 321mdashas well as byour plans for the horizontal margins A margin containing anoccasional sidenote will require less space that a margin ripe withphotographs tables and diagrams

The vertical margins may contain additional navigational aidssuch as the page numbers and running headers in this book Ifyour feel the horizontal margins are underutilized you may alsouse them for this purpose [54 sec 852]

In print designmdashand wherever else the page height is fixedmdashwe need to also decide on the text height The text height needs tobe a multiple of the body text line height so that it is possible tocompletely fill the text block with text It is typical to derive thetext height from the text width to achieve proportions that workwell with the proportions of the page [54 sec 842]

34 ColorIn both print and web design it is perfectly reasonable to useeither just the combination of black and white or shades of grayA secondary color may be introduced to enliven the page if thedesign calls for such a measure red has historically been used forthis purpose (see Figure 33) More than one hue of color may beintroduced although each additional one makes it more difficultto establish a visual system that is intelligible to the reader

The general guidelines are to only use colored typefaces foremphasis not for the body text and on backgrounds that are

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 50: Electronic Document Preparation Pocket Primer

34 COLOR 49

Figure 33 An excerpt from the Latin Vulgate Bible printed by theGerman goldsmith printer and publisher Anton Koberger in 1487

(ideally) colorless or of sufficient contrast with the typeface colorDistinct colors should stay distinct even for the color-blind readerunless the lack of distinction between the colors does not impairunderstanding

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 51: Electronic Document Preparation Pocket Primer

Bibliography

[1] Mary Brandel lsquolsquo1963 The debut of asci irsquorsquo InComputerworld(July 1999) url httpeditioncnncomTECHcomputing9907061963idg (visited on 09062015) (cit on p 5)

[2] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1963 10 East 40th Street New York 16 nyusa the American Standard Association June 1963 urlhttp worldpowersystems com J codes X3 4 - 1963

(visited on 01282015) (cit on p 5)[3] i so tc97sc2 Information technology ndash iso 7-bit coded character

set for information interchange i so 6461972 Geneva Switzer-land the International Organization for Standardization1972 (cit on pp 5 7)

[4] asa Sectional Committee on Computers and InformationProcessing American Standard Code for Information Inter-change X 34-1986 10 East 40th Street New York 16 ny usathe American Standard Association June 1986 (cit on p 6)

[5] Unicode Consortium the Unicode Standard Version 10 Vol 1Reading ma usa Addison-Wesley Developers Press Oct1991 isbn 0-201-56788-1 (cit on p 8)

[6] Unicode Consortium the Unicode Standard Version 10 Vol 2Reading ma usa Addison-Wesley Developers Press June1992 isbn 0-201-60845-6 (cit on p 8)

[7] isoiec jtc1sc2 Information technology ndash the Universalmultiple-octet coded Character Set (ucs) ndash Part 1 Architectureand Basic Multilingual Plane isoiec 10646-11993 Geneva

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 52: Electronic Document Preparation Pocket Primer

52 BIBLIOGRAPHY

Switzerland the International Organization for Standard-ization May 1993 (cit on p 8)

[8] i soiec jtc1sc2 Transformation Format for 16 planes of group00 (utf-16) isoiec 10646-11993Amd 11996 GenevaSwitzerland the International Organization for Standard-ization Oct 1996 (cit on p 8)

[9] isoiec jtc1sc2 ucs Transformation Format 8 (utf-8)isoiec 10646-11993Amd 21996 Geneva Switzerlandthe International Organization for Standardization Oct1996 (cit on p 8)

[10] Unicode Consortium the Unicode Standard Version 90 ndash CoreSpecification Tech rep Mountain View ca usa July 2016url httpwwwunicodeorgversionsUnicode900UnicodeStandard-90pdf (visited on 09172015) (cit onpp 8ndash10)

[11] Q-Success Usage of character encodings for websites urlhttpw3techscomtechnologiesoverviewcharacter_

encodingall (visited on 09102015) (cit on p 9)[12] Unicode Consortium Unicode Technical Standard 10 Version

900 Unicode Collation Algorithm Tech rep May 2016 urlhttpwwwunicodeorgreportstr10tr10-34html

(visited on 09172016) (cit on p 10)[13] Unicode Consortium Unicode cldr Project Tech rep url

httpcldrunicodeorg (visited on 09172016) (cit onp 10)

[14] iso tc171sc2 Document management ndash Portable documentformat iso 320002008 Geneva Switzerland the Interna-tional Organization for Standardization July 2008 (cit onp 13)

[15] isoiec jtc1sc34 Document description and processing lan-guages ndash Office Open XML File Formats isoiec 295002012Geneva Switzerland the International Organization forStandardization Oct 2012 (cit on p 13)

[16] isoiec jtc1sc34 Information technology ndash Open DocumentFormat for Office Applications (OpenDocument) v10 isoiec263002006 Geneva Switzerland the International Organi-zation for Standardization Dec 2006 (cit on p 13)

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 53: Electronic Document Preparation Pocket Primer

BIBLIOGRAPHY 53

[17] Noam Chomsky lsquolsquoThree models for the description of lan-guagersquorsquo In Information Theory IEEE Transactions on 23 (1956)pp 113ndash124 (cit on p 14)

[18] isoiec jtc1sc22 Information technology ndash the Portable Op-erating System Interface ndash Part 2 Shell and Utilities isoiec9945-21993 Geneva Switzerland the International Organi-zation for Standardization Dec 1993 (cit on p 14)

[19] Jeffrey E F Friedl Mastering Regular Expressions 3rd edOrsquoReilly Media 2006 p 544 isbn 978-0-596-52812-6 (citon p 14)

[20] Unicode Consortium Unicode Technical Standard 18 Version17 Unicode Regular Expressions Tech rep Nov 2013 urlhttpwwwunicodeorgreportstr18tr18-17html

(visited on 09262015) (cit on p 16)[21] Dale Dougherty and Arnold Robbins Sed amp awk Second

Edition OrsquoReilly Media 1997 i sbn 1565922255 url http docstore mik ua orelly unix sedawk (visited on09262015) (cit on p 16)

[22] Ben Collins-Sussman Brian W Fitzpatrick and C MichaelPilato Version Control with Subversion OrsquoReilly 2002 urlhttpsvnbookred-beancom (visited on 09262015)(cit on p 17)

[23] Charles F Goldfarb lsquolsquothe Roots of sgml ndash A Personal Rec-ollectionrsquorsquo In (1996) url httpwwwsgmlsourcecomhistoryrootshtm (visited on 07292015) (cit on p 22)

[24] Charles F Goldfarb lsquolsquosgml The Reason Why and the FirstPublishedHintrsquorsquo In Journal of the American Society for Informa-tion Science 48 (7 July 1997) url httpwwwsgmlsourcecomhistoryjasishtm (visited on 07292015) (cit onp 22)

[25] Charles F Goldfarb lsquolsquoIntroduction to Generalized MarkuprsquorsquoIn (1981) url http www sgmlsource com history AnnexAhtm (visited on 07292015) (cit on p 22)

[26] i soiecjtc1sc34 Information processing ndash Text and office sys-tems ndash Standard Generalized Markup Language (sgml) i soiec88791986 Geneva Switzerland the International Organi-zation for Standardization Oct 1986 (cit on p 22)

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 54: Electronic Document Preparation Pocket Primer

54 BIBLIOGRAPHY

[27] Charles F Goldfarb the sgml Handbook New York NY USAOxford University Press Inc 1990 i sbn 978-0-198-53737-3(cit on p 22)

[28] Jean Paoli Tim Bray and Michael Sperberg-McQueen Ex-tensible Markup Language (xml) 10 w3c Recommendationw3c Feb 1998 url httpwwww3orgTR1998REC-xml-19980210 (visited on 07312015) (cit on pp 23 31)

[29] isoiec jtc1sc18wg8 Proposed TC for Web sgml Adap-tations for sgml isoiec N1929 the International Organi-zation for Standardization June 1997 url httpxmlcoverpagesorgwg8-n1929-ghtml (visited on 07312015)(cit on p 23)

[30] Haringkon Wium Lie and Bert Bos Cascading Style Sheets level1 Recommendation w3c Dec 1996 url httpwwww3orgTRREC-CSS1-961217 (visited on 07312015) (cit onpp 23 29)

[31] C M Sperberg-McQueen and Claus Huitfeldt lsquolsquogoddagA Data Structure for Overlapping Hierarchiesrsquorsquo In DigitalDocuments Systems and Principles 8th International Confer-ence on Digital Documents and Electronic Publishing DDEP2000 5th International Workshop on the Principles of DigitalDocument Processing PODDP 2000 Munich Germany Sep-tember 13-15 2000 Revised Papers Ed by Peter King andEthan V Munson Berlin Heidelberg Springer Berlin Hei-delberg 2004 pp 139ndash160 isbn 978-3-540-39916-2 doi101007978-3-540-39916-2_12 (cit on p 27)

[32] TimBray DaveHollander andAndrewLaymanNamespacesin xml w3c Recommendation w3c Jan 1999 url httpwwww3orgTR1999REC-xml-names-19990114 (visitedon 08212015) (cit on p 27)

[33] M Duerst the Internationalized Resource Identifiers (iris) rfc3987 rfc Editor Jan 2005 url httptoolsietforghtmlrfc3987 (visited on 08312015) (cit on p 27)

[34] Norman Walsh DocBook 5 The Definitive Guide Apr 2010url httpwwwdocbookorgtdgenhtmldocbookhtml(visited on 08182015) (cit on p 28)

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 55: Electronic Document Preparation Pocket Primer

BIBLIOGRAPHY 55

[35] Tim Berners-Lee Information Management A Proposal Techrep Mar 1989 url httpwwww3orgHistory1989proposalhtml (visited on 08312015) (cit on p 28)

[36] T Berners-Lee Hypertext Markup Language ndash 20 rfc 1866rfc Editor Nov 1995 url httptoolsietforghtmlrfc1866 (visited on 07312015) (cit on p 28)

[37] Jon Postel DoD standard Transmission Control Protocol rfc761 rfc Editor Jan 1980 url httptoolsietforghtmlrfc761 (visited on 09162016) (cit on p 28)

[38] Ian Hickson et al html5 A vocabulary and associated apisfor html and xhtml Recommendation w3c Oct 2014 urlhttpwwww3orgTR2014REC-html5-20141028 (visitedon 07312015) (cit on p 29)

[39] ecma International Standard ecma-262 - ecmaScript LanguageSpecification Tech rep June 1997 url httpwwwecma-internationalorgpublicationsfilesECMA-ST-ARCH

ECMA-262201st20edition20June201997pdf (visitedon 07312015) (cit on p 29)

[40] Netscape Communications Netscape and Sun announce Java-Script the open cross-platform object scripting language for en-terprise networks and the Internet Dec 1995 url httpwpnetscapecomnewsrefprnewsrelease67html (visited on02132008) (cit on p 29)

[41] Dave Raggett et al Reformulating html in xml w3c Recom-mendation w3c Dec 1998 url httpwwww3orgTR1998WD-html-in-xml-19981205 (visited on 08202015)(cit on p 31)

[42] Steven Pemberton et al xhtmltrade 10 The Extensible HyperTextMarkup Language w3c Recommendation w3c Jan 2000url httpwwww3orgTR2000REC-xhtml1-20000126(visited on 08202015) (cit on p 31)

[43] T Berners-Lee Linked Data Tech rep 2006 url httpswwww3orgDesignIssuesLinkedDatahtml (visited on09172016) (cit on p 31)

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 56: Electronic Document Preparation Pocket Primer

56 BIBLIOGRAPHY

[44] Ora Lassila and Ralph R Swick Resource Description Frame-work (rdf) Model and Syntax Specification w3c Recommen-dation w3c Feb 1999 url httpwwww3orgTR1999REC-rdf-syntax-19990222 (visited on 08182015) (cit onpp 31 32)

[45] Dan Brickley and R V Guha rdf Vocabulary DescriptionLanguage 10 rdf Schema w3c Recommendation w3c Feb2004 url httpwwww3orgTR2004REC-rdf-schema-20040210 (visited on 08182015) (cit on p 32)

[46] Deborah L McGuinness and Frank van Harmelen owl WebOntology Language w3c Recommendation w3c Feb 2004url httpwwww3orgTR2004REC-owl-features-20040210 (visited on 08182015) (cit on p 32)

[47] Dan Brickley and R V Guha json-ld 10 A JSON-basedSerialization for Linked Data w3c Recommendation w3cJan 2014 url httpwwww3orgTR2014REC-json-ld-20140116 (visited on 08192015) (cit on p 32)

[48] David Beckett et al rdf 11 Turtle w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-turtle-20140225 (visited on 08292015) (cit on p 32)

[49] David Beckett rdf 11 N-Triples w3c Recommendationw3c Feb 2014 url httpwwww3orgTR2014REC-n-triples-20140225 (visited on 08192015) (cit on p 32)

[50] Ben Adida et al rdfa in xhtml Syntax and Processing w3cRecommendation w3c Oct 2008 url httpwwww3org TR 2008 REC - rdfa - syntax - 20081014 (visited on08192015) (cit on p 32)

[51] Peter Schaffter What exactly is mom 2015 url httpwwwschafftercamommom-01html (visited on 09162016)(cit on p 37)

[52] Donald Ervin Knuth Digital Typography The Center for theStudy of Language and Information Publications 1998 i sbn978-0-387-98269-4 (cit on p 36)

[53] Albert Kapr Sto a jedna věta ke knižniacute uacutepravě Trans by An-toniacuten Rambousek Lacerta 1999 url httpwwwsazbacztypoglosytypo101pdf (visited on 10202015) (cit onpp 41 46 47)

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 57: Electronic Document Preparation Pocket Primer

BIBLIOGRAPHY 57

[54] Robert Bringhurst the Elements of Typographic Style PointRoberts andWashHartleyampMarks 1992 i sbn 0-88179-110-5(cit on pp 41 42 45ndash48)

[55] Matthew Butterick Butterickrsquos Practical Typography Line spac-ing url httppracticaltypographycomline-spacinghtml (visited on 11022015) (cit on p 42)

[56] Vladimiacuter Beran et al Aktualizovanyacute typografickyacute manuaacutel6th ed Kafka Design 2014 (cit on p 45)

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 58: Electronic Document Preparation Pocket Primer

Acronyms

ack The ACKnowledgement characterapi Application Programming Interfaceasa The American Standard Associationascii The American Standard Code for Information Interchangeatampt The American Telephone and Telegraph corporationbel The BELl characterbmp The Basic Multilingual Planebre The Basic Regular Expressionsbs The BackSpace characterbsd The Berkeley Software Distribution Also known as the Berke-ley Unixca Californiacan The CANcel charactercern The European Organization for Nuclear Research (la ConseilEuropeacuteen pour la Recherche Nucleacuteaire)cldr The Common Locale Data Repositorycli Command Line Interfacecobol The COmmon Business-Oriented Languagecr The Carriage Return charactercss The Cascading Style Sheets languagedc The Dublin Coredc1 The Device Control character No 1dc2 The Device Control character No 2dc3 The Device Control character No 3dc4 The Device Control character No 4del The DELete characterdle The Data Link Escape characterdps Document Preparation System

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 59: Electronic Document Preparation Pocket Primer

60 ACRONYMS

dtd Document Type Declarationdtp DeskTop Publishingebcdic The Extended Binary Coded Decimal Interchange Codeecma The European Computer Manufacturers Associationem The End of Mediumemacs The Eventually Munches All Computer Storage editorenq The ENQuiry charactereot The End Of Transmissionere The Extended Regular Expressionsesc The ESCape characteretb The End of Transmission Blocketx The End of TeXteuc The Extended Unix Codeff The Form Feed characterfoaf Friend Or A Foefortran The FORmula TRANslatorfs The File Separatorfsm The Free Software Movementgml The General Markup Languagegnu gnu is Not Unixgs The Group Separatorgui Graphical User Interfaceht The Horizontal Tabhtml The HyperText Markup Languageibm The International Business Machines Corporationiec The International Electrotechnical Commissionime Input Method Editoriri The Internationalized Resource Identifieriso The International Organization for Standardizationj is The Japanese Industrial Standards encodingjoe The Joersquos Own Editorjson The JavaScript Object Notationjson-ld json for ldjtc A Joint tcld Linked Datalf The Line Feedma Massachusettsmathml The Mathematical Markup Languagenak The Negative-AcKnowledgement characternul The NULl character

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 60: Electronic Document Preparation Pocket Primer

ACRONYMS 61

ny New Yorkocr Optical Character Recognitionodf The Open Document Format for office applicationsooxml The Office Open XML formatowl The Web Ontology Languagepc The ibm Personal Computerpdf The Portable Document Formatpico The PIne COmposerposix The Portable Operating System Interfacerdf The Resource Description Frameworkrdfa rdf in attributesrelax ng The REgular LAnguage for xml New Generationrfc A Request For Commentsrs The Record Separatorsc A SubCommitteesgml The Standard General Markup Languagesi The Shift In characterso The Shift Out charactersoh The Start of Headingsr Sound Recognitionstx The Start of Textsub The SUBstitute charactersvg The Scalable Vector Graphics languagesvn SubVersioNsyn The SYNchronous Idle charactertc A Technical Committeetei The Text Encoding Initiativetron The Real-time Operating system Nucleusucs The Universal multiple-octet coded Character Setus The Unit Separatorusa The United States of Americautf The ucs Transformation Formatvcs Version Control Systemsvi The Visual Interactive editorvim vi IMprovedvt The Vertical Tabw3c The World Wide Web Consortiumwg AWorking Groupwysiwyg What You See Is What You Getxhtml The eXtensible HyperText Markup Language

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 61: Electronic Document Preparation Pocket Primer

62 ACRONYMS

xml The eXtensible Markup Language

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 62: Electronic Document Preparation Pocket Primer

Index

ack 6Adobe FrameMaker 14Adobe InDesign 14 39alignmentjustified 42ragged 42

Anton Koberger 49Apache OpenOffice 13 20 39api 55asa 51asci i 5ndash9 11 12 14 51AsciiDoc 39atampt 35Atom 13awk 16 17

sect

Bazaar 17bel 6bmp 8 9 14Bob Berner 5body text 41brealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

bre 14ndash16bs 6bsd 13

sect

ca 52can 6cern 28

character code 5character encoding 5Chomsky hierarchy 14Christian Morgenstern 4cldr 52cli 13 16code page 7code point 8Compose key 11CONCUR 27control code 5cr 6Creole 39css 23 29ndash32 44

sect

dc 32 33dc1 6dc2 6dc3 6dc4 6del 6dle 6Donald Knuth 36dpsbatch-oriented 35interactivedesktop publishing 36word processing 36interactive 13 35

dps 13 17 18 32 35 36 39dtd 23 25ndash27dtp 36

sect

ebcdic 5ecma 55Edgar Allen Poe 37

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 63: Electronic Document Preparation Pocket Primer

64 INDEX

Elements of Style 3em 6Emacs 13endianity 10endnote 47enq 6eot 6erealternation operator 15backreference 15escape character 15matching list expression 15non-matching list expression 15repetition operator 15subexpression 15

ere 14ndash16esc 6etb 6120576-TEX 38etx 6euc 5

sectF M Cornford 43ff 6foaf 32 33footnote 47formal grammar 14fortran 4From Religion to Philosophy A Study in

the Origins of Western Speculation 43fs 6fsm 35

sectGit 17gml 22gnuLinux 13nano 13

gnu 13 14 35Google Documents 18Google Pinyin 11grep 16 17groff see troffgs 6gui 13 35

sectHan Unification 9heading 45Henrik Ibsen 27ht 6

html 28ndash32 34 39 44 55sect

ibm 5 12 22iconv 10iec 7 10 51ndash54ime 12ir i 27 28 31 32 54iso 7 10 51ndash54

sectJavaScript 29Jeffrey E F Friedl 14j is 5joe 13JScript 29json 32json-ld 32 56jtc 51ndash54justification see alignment

sectKing Lear 48

sectLATEX 36 43Latin Vulgate Bible 49ld 31 32 55leading see line spacingLeafpad 13lf 6lightweight markup language 39line height 45list 46

sectma 51MakeDoc 39Markdown 39markuplogical 21 29 30 35 36presentation 21 29 30 35 36

mathml 28 31Mercurial 17microformatting 32Microsoft Word 14 20 39

sectN-Triples 32 33nak 6Noam Chomskyhierarchy 14

Noam Chomsky 14note 46Notepad++ 13Notepad 13

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 64: Electronic Document Preparation Pocket Primer

INDEX 65

nroff see troffnul 6ny 51

sectocr 12odf 13ooxml 13owl 32 56

sectparagraphblock 47indented 45outdented 45

paragraph 42paragraphsblock 45

pc 5 11pdf 13pdfTEX 38Peer Gynt 27Perl 14pico 13pinyin 11plain TEX 38posix 53printable character 5Punycode 8

sectQuarkXPress 14quotationblock 47run-in 47

sectrag see alignmentrdfliteral 32object 31ontology 32predicate 31resource 31subject 31triplet 31

rdf 28 31ndash35 56rdfa 32 34 56regex see regular expressionregular expression 13 14regular grammar 14relax ng 23 25rfc 54 55rs 6

sectsans-serif 41sc 51ndash54Scribus 13 14 39sed 16 17serif 41Setext 39sgmlapplication 23attribute 22element 22entity 22node 22tag 22

sgml 22 23 25 27ndash29 39 53 54sgml The Reason Why and the First Pub-

lished Hint 22si 6sidenote 46small capitals 45so 6soh 6sr 12stx 6style guide 3sub 6Sublime Text 13surrogate pair 8svg 28 31svn 17ndash20syn 6

secttable 46tc 51 52tei 28text editor 13text file 4text processing 4TextEdit 13 14the Art of Computer Programming 36the Cask of Amontillado 37the Chicago Manual of Style 3the Oxford Style Manual 3the Subversion book 17Tim Berners-Lee 31Timothy John Berners-Lee 28Tortoise svn 18 20Trichter 4troff

man 36

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index
Page 65: Electronic Document Preparation Pocket Primer

66 INDEX

me 36mom 36

troff 35tron 9Turtle 32 33typeface 41

sectucsblock 8ucs-4 8

ucs 6 8ndash12 14 16 51 52Unicodecase conversion 10normalization 10

us 6usa 51 52utf

utf-16 52utf-16 8utf-32 8utf-7 8utf-8 52utf-8 8

utf 6 8ndash10 52sect

VBScript 29vcscentralized 17decentralized 17

vcs 17ndash20version control 13vi 13vim 13

vt 6sect

w3c 23 28 29 31 32 54ndash56wg 54Wikicode 39William Shakespeare 48William Strunk 3Word Online 18writing rulesgrammar 3ortography 3typography 4

wysiwyg 35sect

XWindow System 11XƎTEX 43xhtml 28 31 32 55 56xmlapplication 23DocBook 28format 23language 23namespace 27schema language 23Schema 23 26validity 23well-formedness 23

xml 23ndash29 31ndash33 39 54 55xmllint 26XPath 23XPointer 23XQuery 23

  • Introduction
  • Writing
    • Text Processing
      • Character Encoding
      • Text Input
      • Text Editors
      • Interactive Document Preparation Systems
      • Regular Expressions
        • Version Control
          • Markup
            • Meta Markup Languages
              • The General Markup Language
              • The Extensible Markup Language
                • Markup on the World Wide Web
                  • The Hypertext Markup Language
                  • The Extensible Hypertext Markup Language
                  • The Semantic Web and Linked Data
                    • Document Preparation Systems
                      • Batch-oriented Systems
                      • Interactive Systems
                        • Lightweight Markup Languages
                          • Design
                            • Fonts
                            • Structural Elements
                              • Paragraphs and Stanzas
                              • Headings
                              • Tables and Lists
                              • Notes
                              • Quotations
                                • Page Layout
                                • Color
                                  • Bibliography
                                  • Acronyms
                                  • Index