|
|
Map Development Area (MDA)
Location:
_/_
misc
_/_
encoding
_/_
unicode
| unicode |
|
|
Types:
|
-
Comment:
-
is mainly used in document processing, XML and SGML applications
-
example
-
Euro symbol has code U+20AC
-
motivation
-
solve document exchange problem -- include all characters of all languages (on this planet only) -- living or dead -- natural or invented -- 250 writing systems and thousands of languages
-
syntax
-
U+XXXX (hexadecimal X)
-
Comment:
-
numbers below AND ABOVE 65535 are used (NOT 16 bit only!!) -- Unicode uses now 21 bits, UCS 32 bits
-
Comment:
-
classifies characters into letters, numbers, punctuation, accents, ... -- maps cases (a to A) -- defines how to display characters -- how to combine characters -- how to treat bidirectional text -- algorithms for sorting, case folding, regular expressions
-
history
-
v2.0: 38,885 assigned characters -- v3.0: 49,194 -- v3.2: 95,156 -- v4.0: 96,382
-
Comment:
-
ISO 8859-1 is embedded
-
Comment:
-
coordinated code points: Unicode standard by the Unicode consortium -- ISO 10646, Universal Multiple-Octet Character Set (UCS) by ISO -- "synchronized standards"
-
encoding
-
could be encoded with 4 bytes, but this is wasteful -- different encoding UCS-2, UCS-4, UTF-8, UTF-16, UTF-32
|
|
|