Standards, Environments, and Macros iconvunicode(5)
NAME
iconvunicode - code set conversion tables for Unicode
DESCRIPTION
The following code set conversions are supported:
CODE SET CONVERSIONS SUPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
Element Element
ISO 8859-1 (Latin 1) 8859-1 UTF-8 UTF-8
ISO 8859-2 (Latin 2) 8859-2 UTF-8 UTF-8
ISO 8859-3 (Latin 3) 8859-3 UTF-8 UTF-8
ISO 8859-4 (Latin 4) 8859-4 UTF-8 UTF-8
ISO 8859-5 (Cyrillic) 8859-5 UTF-8 UTF-8
ISO 8859-6 (Arabic) 8859-6 UTF-8 UTF-8
ISO 8859-7 (Greek) 8859-7 UTF-8 UTF-8
ISO 8859-8 (Hebrew) 8859-8 UTF-8 UTF-8
ISO 8859-9 (Latin 5) 8859-9 UTF-8 UTF-8
ISO 8859-10 (Latin 6) 8859-10 UTF-8 UTF-8
Japanese EUC eucJP UTF-8 UTF-8
Chinese/PRC EUC
(GB 2312-1980) gb2312 UTF-8 UTF-8
ISO-2022 iso2022 UTF-8 UTF-8
Korean EUC koKR-euc Korean UTF-8 koKR-UTF-8
ISO-2022-KR koKR-iso2022-7 Korean UTF-8 koKRUTF-8
Korean Johap
(KS C 5601-1987) koKR-johap Korean UTF-8 koKR-UTF-8
Korean Johap
(KS C 5601-1992) koKR-johap92 Korean UTF-8 koKR-UTF-8
Korean UTF-8 koKR-UTF-8 Korean EUC koKR-euc
Korean UTF-8 koKR-UTF-8 Korean Johap koKR-johap
(KS C 5601-1987)
Korean UTF-8 koKR-UTF-8 Korean Johap koKR-johap92
(KS C 5601-1992)
KOI8-R (Cyrillic) KOI8-R UCS-2 UCS-2
KOI8-R (Cyrillic) KOI8-R UTF-8 UTF-8
PC Kanji (SJIS) PCK UTF-8 UTF-8
PC Kanji (SJIS) SJIS UTF-8 UTF-8
UCS-2 UCS-2 KOI8-R (Cyrillic) KOI8-R
UCS-2 UCS-2 UCS-4 UCS-4
CODE SET CONVERSIONS SUPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
SunOS 5.11 Last change: 18 Apr 1997 1
Standards, Environments, and Macros iconvunicode(5)
Element Element
UCS-2 UCS-2 UTF-7 UTF-7
UCS-2 UCS-2 UTF-8 UTF-8
UCS-4 UCS-4 UCS-2 UCS-2
UCS-4 UCS-4 UTF-16 UTF-16
UCS-4 UCS-4 UTF-7 UTF-7
UCS-4 UCS-4 UTF-8 UTF-8
UTF-16 UTF-16 UCS-4 UCS-4
UTF-16 UTF-16 UTF-8 UTF-8
UTF-7 UTF-7 UCS-2 UCS-2
UTF-7 UTF-7 UCS-4 UCS-4
UTF-7 UTF-7 UTF-8 UTF-8
UTF-8 UTF-8 ISO 8859-1 (Latin 1) 8859-1
UTF-8 UTF-8 ISO 8859-2 (Latin 2) 8859-2
UTF-8 UTF-8 ISO 8859-3 (Latin 3) 8859-3
UTF-8 UTF-8 ISO 8859-4 (Latin 4) 8859-4
UTF-8 UTF-8 ISO 8859-5 (Cyrillic) 8859-5
UTF-8 UTF-8 ISO 8859-6 (Arabic) 8859-6
UTF-8 UTF-8 ISO 8859-7 (Greek) 8859-7
UTF-8 UTF-8 ISO 8859-8 (Hebrew) 8859-8
UTF-8 UTF-8 ISO 8859-9 (Latin 5) 8859-9
UTF-8 UTF-8 ISO 8859-10 (Latin 6) 8859-10
UTF-8 UTF-8 Japanese EUC eucJP
UTF-8 UTF-8 Chinese/PRC EUC gb2312
(GB 2312-1980)
UTF-8 UTF-8 ISO-2022 iso2022
UTF-8 UTF-8 KOI8-R (Cyrillic) KOI8-R
UTF-8 UTF-8 PC Kanji (SJIS) PCK
UTF-8 UTF-8 PC Kanji (SJIS) SJIS
UTF-8 UTF-8 UCS-2 UCS-2
UTF-8 UTF-8 UCS-4 UCS-4
UTF-8 UTF-8 UTF-16 UTF-16
UTF-8 UTF-8 UTF-7 UTF-7
UTF-8 UTF-8 Chinese/PRC EUC zhCN.euc
(GB 2312-1980)
CODE SET CONVERSIONS SUPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
Element Element
UTF-8 UTF-8 ISO 2022-CN zhCN.iso2022-7
UTF-8 UTF-8 Chinese/Taiwan Big5 zhTW-big5
UTF-8 UTF-8 Chinese/Taiwan EUC zhTW-euc
(CNS 11643-1992)
UTF-8 UTF-8 ISO 2022-TW zhTW-iso2022-7
Chinese/PRC EUC zhCN.euc UTF-8 UTF-8
SunOS 5.11 Last change: 18 Apr 1997 2
Standards, Environments, and Macros iconvunicode(5)
(GB 2312-1980)
ISO 2022-CN zhCN.iso2022-7 UTF-8 UTF-8
Chinese/Taiwan Big5 zhTW-big5 UTF-8 UTF-8
Chinese/Taiwan EUC zhTW-euc UTF-8 UTF-8
(CNS 11643-1992)
ISO 2022-TW zhTW-iso2022-7 UTF-8 UTF-8
EXAMPLES
Example 1 The library module filename
In the conversion library, /usr/lib/iconv (see iconv(3C)),
the library module filename is composed of two symbolic ele-
ments separated by the percent sign (%). The first symbol
specifies the code set that is being converted; the second
symbol specifies the target code, that is, the code set to
which the first one is being converted.
In the conversion table above, the first symbol is termed
the "FROM Filename Element". The second symbol, representing
the target code set, is the "TO Filename Element".
For example, the library module filename to convert from the
Korean EUC code set to the Korean UTF-8 code set is
koKR-euc%koKR-UTF-8
FILES
/usr/lib/iconv/*.so conversion modules
SEE ALSO
iconv(1), iconv(3C), iconv(5)
Chernov, A., Registration of a Cyrillic Character Set, RFC
1489, RELCOM Development Team, July 1993.
Chon, K., H. Je Park, and U. Choi, Korean Character Encoding
for Internet Messages, RFC 1557, Solvit Chosun Media,
December 1993.
SunOS 5.11 Last change: 18 Apr 1997 3
Standards, Environments, and Macros iconvunicode(5)
Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transforma-
tion Format of Unicode, RFC 1642, Taligent, Inc., July 1994.
Lee, F., HZ - A Data Format for Exchanging Files of Arbi-
trarily Mixed Chinese and ASCI characters, RFC 1843, Stan-
ford University, August 1995.
Murai, J., M. Crispin, and E. van der Poel, Japanese Charac-
ter Encoding for Internet Messages, RFC 1468, Keio Univer-
sity, Panda Programming, June 1993.
Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding
for Internet Messages, RFC 1555, Israeli Inter-University,
Hebrew University, December 1993.
Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC
1815, Tokyo Institute of Technology, July 1995.
Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Exten-
sion of ISO-2022-JP, RFC 1554, Tokyo Institute of Technol-
ogy, December 1993.
Reynolds, J., and J. Postel, ASIGNED NUMBERS, RFC 1700,
University of Southern California/Information Sciences
Institute, October 1994.
Simonson, K., Character Mnemonics & Character Sets, RFC
1345, Rationel Almen Planlaegning, June 1992.
Spinellis, D., Greek Character Encoding for Electronic Mail
Messages, RFC 1947, SENA S.A., May 1996.
The Unicode Consortium, The Unicode Standard, Version 2.0,
Addison Wesley Developers Press, July 1996.
Wei, Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCI
Printable Characters-Based Chinese Character Encoding for
Internet Messages, RFC 1842, AsiaInfo Services Inc., Harvard
University, Rice University, University of Maryland, August
1995.
SunOS 5.11 Last change: 18 Apr 1997 4
Standards, Environments, and Macros iconvunicode(5)
Yergeau, F., UTF-8, a transformation format of Unicode and
ISO 10646, RFC 2044, Alis Technologies, October 1996.
Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and M. Crispin,
Chinese Character Encoding for Internet Messages, RFC 1922,
Tsinghua University, China Information Technology Standardi-
zation Technical Committee (CITS), Institute for Information
Industry (I), University of Washington, March 1996.
NOTES
ISO 8859 character sets using Latin alphabetic characters
are distinguished as follows:
ISO 8859-1 (Latin 1) For most West European languages,
including:
Albanian Finnish Italian
Catalan French Norwegian
Danish German Portuguese
Dutch Galician Spanish
English Irish Swedish
Faeroese Icelandic
ISO 8859-2 (Latin 2) For most Latin-written Slavic and
Central European languages:
Czech Polish Slovak
German Rumanian Slovene
Hungarian Croatian
ISO 8859-3 (Latin 3) Popularly used for Esperanto, Gali-
cian, Maltese, and Turkish.
ISO 8859-4 (Latin 4) Introduces letters for Estonian,
Latvian, and Lithuanian. It is an
incomplete predecessor of ISO
8859-10 (Latin 6).
ISO 8859-9 (Latin 5) Replaces the rarely needed Ice-
landic letters in ISO 8859-1 (Latin
1) with the Turkish ones.
SunOS 5.11 Last change: 18 Apr 1997 5
Standards, Environments, and Macros iconvunicode(5)
ISO 8859-10 (Latin 6) Adds the last Inuit (Greenlandic)
and Sami (Lappish) letters that
were not included in ISO 8859-4
(Latin 4) to complete coverage of
the Nordic area.
SunOS 5.11 Last change: 18 Apr 1997 6
|