ISO 8859 (Latin) character encoding

Uit De Vliegende Brigade
(Doorverwezen vanaf Latin1 vs. Latin3)
Naar navigatie springen Naar zoeken springen
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Background

ASCII

As many stories around computer character sets, this story start with ASCII - the American Standard Code for Information Interchange:

  • Actually a bunch of character sets precided ASCII, especially telegraph codes, but let's not get distracted here
  • ASCII is a 7-bit coding system. The 8th bit was used for parity. Additionally, there was a time that it wasn't common to group bits in groups of 8, so maybe this is even from before that time
  • With 7-bit encoding, there are only 128 code points
  • Of these 128 code points, only 95 were used for printable characters

MCS & ECMA-94

  • The Multinational Character Set (DMCS or MCS) was a character set developed in 1983 by Digital Equipment Corporation (DEC) for use in its VT220 terminal. It was an extention to 8-bit of ASCII.
  • On its turn, MCS was the inspiration for ECMA-94

ISO/IEC 8859

  • ISO/IEC-8859 (ISO-8859 for short) is a series of 8-bit character encodings, consting of 15 parts, coded ISO 8859-1, ISO 8859-2, etc
  • There is also ISO 8859-16. Maybe that's a later update to include the euro symbol?
  • Parts 1 to 4, were based on ECMA-94.

Parts of ISO 8859

See the table here for a description of all 16 parts and how different parts have different names, including Latin-1, Latin-4, Latin/Cyrillic and Latin-10 South-Eastern European (part 16).

Latin-1

ISO 8859-1, ISO-8859 part 1 or Latin-1, was the default coding for HTML. In HTML 5, this has been replaced by encoding Windows-1252. It's quite interchangeable with ISO 8859-1.

Latin-1 vs Latin-3

I expected that Latin1 (ISO-8859-1) and Latin3 (ISO-8859-3) would be so close that the difference would be negligible. Nope! See this example, of a French text in Latin1

French text encoded in Latin1. Note characters 'é', '«' and '»'
French text encoded in Latin3. Note the missing characters 'é', '«' and '»'

Sources