Opentopia Directory Encyclopedia Tools

Combining character

Encyclopedia : C : CO : COM : Combining character


Combining characters are characters that are intended to modify other characters. The most common combining characters in the latin script are the combining diacritical marks (including combining accents). In Unicode the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F. Combining diacritical marks are also present in many other blocks of Unicode characters. In Unicode, diacritics are always added after the main character. It is possible to add several diacritics to the same character.

Unicode also contains many precomposed characters. So in many cases it is possible to use both combining diacritics and precomposed characters, at the user or applications choice. This leads to a requirement to perform unicode normalisation before comparing two unicode strings and to carefully design encoding converters to correctly map all of the valid ways to represent a character in unicode to a legacy encoding to avoid data loss. For example, when converting between windows-1258 and VISCII, the former uses combining diacritics whilst the other has a large selection of precomposed characters so a converter using a simple mapping between code values and unicode code points will mess up text when converting between them.

Unicode ranges

See also

External links

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: