6
ISO/IEC 8859-1 1 ISO/IEC 8859-1 ISO/IEC 8859-1:1998 MIME ISO-8859-1 Alias(es) iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819 Standard ISO/IEC 8859 v t e [1] ISO/IEC 8859-1:1998, Information technology 8-bit single-byte coded graphic character sets Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is generally intended for Western Europeanlanguages (see below for a list). It is the basis for most popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode. ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The following other aliases are registered for ISO-8859-1: iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819. The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters including all the missing characters provided by ISO-8859-15. Code page 28591 aka Windows-28591 is the actual ISO-8859-1 codepage. Coverage ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages. Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted): Languages with complete coverage Afrikaans German Occitan Albanian Icelandic Portuguese Basque Indonesian Rhaeto-Romanic Breton Irish (new orthography) Scottish Gaelic Catalan Italian Spanish Corsican Latin (basic classical orthography) Swahili Danish Leonese Swedish English (UK and US) Luxembourgish (basic classical orthography) Walloon Faroese Malay Galician Manx Norwegian (Bokmål and Nynorsk)

ISO-IEC 8859-1

Embed Size (px)

DESCRIPTION

ISO-IEC 8859-1

Citation preview

Page 1: ISO-IEC 8859-1

ISO/IEC 8859-1 1

ISO/IEC 8859-1

ISO/IEC 8859-1:1998

MIME ISO-8859-1

Alias(es) iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819

Standard ISO/IEC 8859

•• v•• t• e [1]

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latinalphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first editionpublished in 1987. It is generally intended for “Western European” languages (see below for a list). It is the basis formost popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode.ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codesfrom ISO/IEC 6429. The following other aliases are registered for ISO-8859-1: iso-ir-100, csISOLatin1, latin1, l1,IBM819, CP819.The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F),where the little-used C1 controls are replaced with additional characters including all the missing characters providedby ISO-8859-15. Code page 28591 aka Windows-28591 is the actual ISO-8859-1 codepage.

CoverageISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script.This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. Itis also commonly used in most standard romanizations of East-Asian languages.Each character is encoded as a single eight-bit code value. These code values can be used in almost any datainterchange system to communicate in the following European languages (with a few exceptions due to missingcharacters, as noted):

Languages with complete coverage

•• Afrikaans •• German •• Occitan•• Albanian •• Icelandic •• Portuguese•• Basque •• Indonesian •• Rhaeto-Romanic•• Breton • Irish (new orthography) •• Scottish Gaelic•• Catalan •• Italian •• Spanish•• Corsican • Latin (basic classical orthography) •• Swahili•• Danish •• Leonese •• Swedish• English (UK and US) • Luxembourgish (basic classical orthography) •• Walloon•• Faroese •• Malay•• Galician •• Manx

• Norwegian (Bokmål and Nynorsk)

Page 2: ISO-IEC 8859-1

ISO/IEC 8859-1 2

Languages commonly supported but with incomplete coverage

Language Missing characters Typical workaround Supported by

Catalan Ŀ, ŀ (deprecated) L·, l·

Czech Č, č, Ř, ř, Š, š, Ž, ž, ch digraph ch ISO-8859-2, Windows-1250

Dutch IJ, ij digraphs IJ, ij

Estonian Š, š, Ž, ž (only present inloanwords)

Sh, sh, Zh, zh ISO-8859-15, Windows-1252

Finnish Š, š, Ž, ž (only present inloanwords)

Sh, sh, Zh, zh ISO-8859-15, Windows-1252

French Œ, œ, and the very rare Ÿ digraphs OE, oe, and Y without the diaeresis ISO-8859-15, Windows-1252

Hungarian Ő, ő, Ű, ű Õ, õ (or Ô, ô; sometimes Ö, ö), Û, û (sometimesÜ, ü)

ISO-8859-2, Windows-1250

Irish (traditionalorthography)

Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ,Ṡ, ṡ, Ṫ, ṫ

Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh,Sh, sh, Th, th

ISO-8859-14

Latin with macrons Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū ISO-8859-13, Windows-1257

Māori Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü ISO-8859-13, Windows-1257

Turkish İ, ı, Ğ, ğ, Ş, ş I, i, G, g, S, s ISO-8859-3, ISO-8859-9,Windows-1254

Welsh Ẁ, ẁ, Ẃ, ẃ, Ŵ, ŵ, Ŷ, ŷ ISO-8859-14

Quotation marksFor some languages listed above the correct typographical quotation marks are missing, as only « », " ", and ' ' areincluded. Also this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Somefonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented singlequotation marks, however this is not considered part of the modern standard.

HistoryISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popularVT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and publishedin March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 [2] (June1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.In 1985 Commodore adopted ISO 8859-1 for its new AmigaOS operating system. The Seikosha MP-1300AI impactdot-matrix printer, used with the Amiga 1000, included this encoding. [citation needed]

In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIMEname of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet.This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters viaevery possible 8-bit value.ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (however the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[3]) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). It and Windows-1252 are often assumed to be the encoding of text on Unix and Microsoft Windows in the absence of locale or other information, this is only gradually being replaced with Unicode

Page 3: ISO-IEC 8859-1

ISO/IEC 8859-1 3

encoding such as UTF-8 or UTF-16.

Codepage layout

ISO/IEC 8859-1

_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

0_

1_

2_ SP002032

[[Exclamationmark

]]002133

"002234

#002335

$002436

%002537

&002638

'002739

(002840

)002941

*002A

42

+002B

43

,002C

44

-002D

45

.002E

46

/002F

47

3_ 0003048

1003149

2003250

3003351

4003452

5003553

6003654

7003755

8003856

9003957

:003A

58

;003B

59

<003C

60

=003D

61

>003E

62

?003F

63

4_ @004064

A004165

B004266

C004367

D004468

E004569

F004670

G004771

H004872

I004973

J004A

74

K004B

75

L004C

76

M004D

77

N004E

78

O004F

79

5_ P005080

Q005181

R005282

S005383

T005484

U005585

V005686

W005787

X005888

Y005989

Z005A

90

[005B

91

\005C

92

]005D

93

^005E

94

_005F

95

6_ `006096

a006197

b006298

c006399

d0064100

e0065101

f0066102

g0067103

h0068104

i0069105

j006A106

k006B107

l006C108

m006D109

n006E110

o006F111

7_ p0070112

q0071113

r0072114

s0073115

t0074116

u0075117

v0076118

w0077119

x0078120

y0079121

z007A122

{007B123

|007C124

}007D125

~007E126

8_

9_

A_ NBSP00A0160

¡00A1161

¢00A2162

£00A3163

¤00A4164

¥00A5165

¦00A6166

§00A7167

¨00A8168

©00A9169

ª00AA170

«00AB171

¬00AC172

SHY00AD173

®00AE174

¯00AF175

B_ °00B0176

±00B1177

²00B2178

³00B3179

´00B4180

µ00B5181

¶00B6182

·00B7183

¸00B8184

¹00B9185

º00BA186

»00BB187

¼00BC188

½00BD189

¾00BE190

¿00BF191

C_ À00C0192

Á00C1193

Â00C2194

Ã00C3195

Ä00C4196

Å00C5197

Æ00C6198

Ç00C7199

È00C8200

É00C9201

Ê00CA202

Ë00CB203

Ì00CC204

Í00CD205

Î00CE206

Ï00CF207

D_ Ð00D0208

Ñ00D1209

Ò00D2210

Ó00D3211

Ô00D4212

Õ00D5213

Ö00D6214

×00D7215

Ø00D8216

Ù00D9217

Ú00DA218

Û00DB219

Ü00DC220

Ý00DD221

Þ00DE222

ß00DF223

E_ à00E0224

á00E1225

â00E2226

ã00E3227

ä00E4228

å00E5229

æ00E6230

ç00E7231

è00E8232

é00E9233

ê00EA234

ë00EB235

ì00EC236

í00ED237

î00EE238

ï00EF239

F_ ð00F0240

ñ00F1241

ò00F2242

ó00F3243

ô00F4244

õ00F5245

ö00F6246

÷00F7247

ø00F8248

ù00F9249

ú00FA250

û00FB251

ü00FC252

ý00FD253

þ00FE254

ÿ00FF255

Page 4: ISO-IEC 8859-1

ISO/IEC 8859-1 4

_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

Similar character setsISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.The lower range 32 to 126 (hex 20 to 7E, the G0 subset) maps exactly to the same coded G0 subset of the ISO 646US variant (commonly known as ASCII), whose ISO 2022 standard switch sequence is "ESC ( B". The higher range160 to 255 (hex A0 to FF, the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switchsequence "ESC . A".ISO/IEC 8859-1 is missing some characters for French and Finnish text and the euro sign. In order to provide someof these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, theremoval of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-freediacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus anumber of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It isvery common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken toavoid generating these characters in ISO-8859-1 labeled content.The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It wasmeant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and hasmost of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered withIANA as "Macintosh", replaced the generic currency sign ¤ with the euro sign €. The few printable characters thatare in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using olderMacintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters thatWindows-1252 has in the C1 codepoint range are all supported in MacRoman.DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally differentarrangement) plus the most widely used graphic characters from code page 437.

References[1] http:/ / en. wikipedia. org/ w/ index. php?title=Template:Infobox_character_encoding& action=edit[2] http:/ / www. ecma-international. org/ publications/ files/ ECMA-ST/ Ecma-094. pdf[3] HTML 5 Draft Recommendation — 12 April 2010, 8.1 Character encodings (http:/ / dev. w3. org/ html5/ spec/ Overview.

html#character-encodings-0), retrieved [2010-04-12].

External links• ISO/IEC 8859-1:1998 (http:/ / www. iso. org/ iso/ en/ CatalogueDetailPage.

CatalogueDetail?CSNUMBER=28245& ICS1=35& ICS2=40& ICS3=)• ISO/IEC 8859-1:1998 (ftp:/ / std. dkuug. dk/ JTC1/ sc2/ wg3/ docs/ n411. pdf) - 8-bit single-byte coded graphic

character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)• Standard ECMA-94 (http:/ / www. ecma-international. org/ publications/ standards/ Ecma-094. htm): 8-Bit Single

Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)• ISO-IR 100 (http:/ / www. itscj. ipsj. or. jp/ ISO-IR/ 100. pdf) Right-Hand Part of Latin Alphabet No.1 (February

1, 1986)• Windows Code pages (http:/ / msdn. microsoft. com/ goglobal/ bb964656)

Page 5: ISO-IEC 8859-1

ISO/IEC 8859-1 5

• Differences between ANSI, ISO-8859-1 and MacRoman Character Sets (http:/ / www. alanwood. net/ demos/charsetdiffs. html)

• The Letter Database (http:/ / www. eki. ee/ letter/ )• The ISO 8859 Alphabet Soup (http:/ / czyborra. com/ charsets/ iso8859. html) - Roman Czyborra's summary of

ISO character sets

Page 6: ISO-IEC 8859-1

Article Sources and Contributors 6

Article Sources and ContributorsISO/IEC 8859-1  Source: http://en.wikipedia.org/w/index.php?oldid=589926607  Contributors: Achurch, Adam78, Adelton, Al shopov, Alxeedo, Amakuha, Andre Engels, Anon user, Anthony,Anárion, Athantor, Auslli, Avjewe, Babak info, Barklund, Basil.bourque, Bearcat, Ben morphett, Bennylin, Bgwhite, BiT, Brion VIBBER, Brycen, Bukzor, Burzuchius, Caoimhin, Ceplm,Cfsenel, Choster, ChrisGualtieri, Christian List, Chrullrich, Circular17, Conversion script, Copyeditor42, Crissov, Curps, CyberSkull, Dakart, DanielPharos, Dbachmann, Deh, Denelson83,Diberri, Docu, Don4of4, Droll, Dthomsen8, Dtobias, Dysprosia, Elektron, Ellmist, Emk (ja), Evertype, Fool4jesus, Furrykef, GPHemsley, Gaius Cornelius, Goh wz, GregorB, Gwinkless, Gyopi,Götz, Harris7, Harryboyles, Here, Icairns, Incnis Mrsi, Indefatigable, IronGargoyle, Ixfd64, JTN, Jasen betts, Jeronimo, Jkl, John, Jor, Keka, Khukri, Konxykogure, Kooo, Ksn, Kwamikagami,Kwi, LauraALo, Lee Daniel Crocker, Liftarn, Liliana-60, LittleBenW, Livajo, Lmatt, Loadmaster, LoveEncounterFlow, Madacs, Magioladitis, ManuelGR, Martin.Budden, Mat cross,Matthiaspaul, Michael Peter Fustumum, Mikeo, Miles, Mjb, Monedula, Mxn, Mzajac, Naohiro19, Natural Cut, NatusRoma, Nbarth, Nickj, Nikevich, Nikola Smolenski, Nohat, Nsaa,OwenBlacker, Oz1cz, Paddu, Patrick, Paul Magnussen, Pengo, Perey, Phenry, Phil Boswell, PierreAbbat, Pjacobi, Plugwash, Pne, Poccil, Polluks, Poogis, Prof Wrong, Proxyma,QuartierLatin1968, Quota, R'n'B, RARPSL, Raffaele Megabyte, Raise exception, Rama, Red King, RedWolf, Rgrg, Rick Block, RickBeton, Rje, RoToRa, Rogper, Ruhrjung, Ruud Koot,Sandrarossi, Saric, Sburke, Shaun, Simo Kaupinmäki, Sl, Sladen, Smb1001, Some jerk on the Internet, Spitzak, Stuartyeates, Stubblyhead, Suruena, TJRC, Tamfos, Tedickey, Telfordbuck,Tevildo, The Nut, Theopolisme, Thistheman, TimR, Timc, Tobias Conradi, Toby Bartels, Torzsmokus, Tox, Truthflux, UTF-8, Urhixidur, Vanisaac, Wavelength, Woohookitty, WorldlyWebster,Wrp103, Yop83, ZanderSchubert, ZeroUm, Zundark, Ævar Arnfjörð Bjarmason, 에멜무지로, 170 anonymous edits

LicenseCreative Commons Attribution-Share Alike 3.0//creativecommons.org/licenses/by-sa/3.0/