MyWebUniversity.com Home Page
 



OpenSolaris man pages main menu


Standards, Environments, and Macros              iconvunicode(5)



NAME
     iconvunicode - code set conversion tables for Unicode

DESCRIPTION
     The following code set conversions are supported:

                           CODE SET CONVERSIONS SUPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename
                               Element                              Element

       ISO 8859-1 (Latin 1)    8859-1            UTF-8               UTF-8
       ISO 8859-2 (Latin 2)    8859-2            UTF-8               UTF-8
       ISO 8859-3 (Latin 3)    8859-3            UTF-8               UTF-8
       ISO 8859-4 (Latin 4)    8859-4            UTF-8               UTF-8
       ISO 8859-5 (Cyrillic)   8859-5            UTF-8               UTF-8
       ISO 8859-6 (Arabic)     8859-6            UTF-8               UTF-8
       ISO 8859-7 (Greek)      8859-7            UTF-8               UTF-8
       ISO 8859-8 (Hebrew)     8859-8            UTF-8               UTF-8
       ISO 8859-9 (Latin 5)    8859-9            UTF-8               UTF-8
       ISO 8859-10 (Latin 6)   8859-10           UTF-8               UTF-8
       Japanese EUC            eucJP             UTF-8               UTF-8
       Chinese/PRC EUC
       (GB 2312-1980)          gb2312            UTF-8               UTF-8
       ISO-2022                iso2022           UTF-8               UTF-8
       Korean EUC              koKR-euc         Korean UTF-8        koKR-UTF-8
       ISO-2022-KR             koKR-iso2022-7   Korean UTF-8        koKRUTF-8
       Korean Johap
       (KS C 5601-1987)        koKR-johap       Korean UTF-8        koKR-UTF-8
       Korean Johap
       (KS C 5601-1992)        koKR-johap92     Korean UTF-8        koKR-UTF-8
       Korean UTF-8            koKR-UTF-8       Korean EUC          koKR-euc
       Korean UTF-8            koKR-UTF-8       Korean Johap        koKR-johap
                                                 (KS C 5601-1987)
       Korean UTF-8            koKR-UTF-8       Korean Johap        koKR-johap92
                                                 (KS C 5601-1992)
       KOI8-R (Cyrillic)       KOI8-R            UCS-2               UCS-2
       KOI8-R (Cyrillic)       KOI8-R            UTF-8               UTF-8
       PC Kanji (SJIS)         PCK               UTF-8               UTF-8
       PC Kanji (SJIS)         SJIS              UTF-8               UTF-8
       UCS-2                   UCS-2             KOI8-R (Cyrillic)   KOI8-R
       UCS-2                   UCS-2             UCS-4               UCS-4



                           CODE SET CONVERSIONS SUPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename



SunOS 5.11          Last change: 18 Apr 1997                    1






Standards, Environments, and Macros              iconvunicode(5)



                               Element                              Element

       UCS-2              UCS-2           UTF-7                   UTF-7
       UCS-2              UCS-2           UTF-8                   UTF-8
       UCS-4              UCS-4           UCS-2                   UCS-2
       UCS-4              UCS-4           UTF-16                  UTF-16
       UCS-4              UCS-4           UTF-7                   UTF-7
       UCS-4              UCS-4           UTF-8                   UTF-8
       UTF-16             UTF-16          UCS-4                   UCS-4
       UTF-16             UTF-16          UTF-8                   UTF-8
       UTF-7              UTF-7           UCS-2                   UCS-2
       UTF-7              UTF-7           UCS-4                   UCS-4
       UTF-7              UTF-7           UTF-8                   UTF-8
       UTF-8              UTF-8           ISO 8859-1 (Latin 1)    8859-1
       UTF-8              UTF-8           ISO 8859-2 (Latin 2)    8859-2
       UTF-8              UTF-8           ISO 8859-3 (Latin 3)    8859-3
       UTF-8              UTF-8           ISO 8859-4 (Latin 4)    8859-4
       UTF-8              UTF-8           ISO 8859-5 (Cyrillic)   8859-5
       UTF-8              UTF-8           ISO 8859-6 (Arabic)     8859-6
       UTF-8              UTF-8           ISO 8859-7 (Greek)      8859-7
       UTF-8              UTF-8           ISO 8859-8 (Hebrew)     8859-8
       UTF-8              UTF-8           ISO 8859-9 (Latin 5)    8859-9
       UTF-8              UTF-8           ISO 8859-10 (Latin 6)   8859-10
       UTF-8              UTF-8           Japanese EUC            eucJP
       UTF-8              UTF-8           Chinese/PRC EUC         gb2312
                                          (GB 2312-1980)
       UTF-8              UTF-8           ISO-2022                iso2022
       UTF-8              UTF-8           KOI8-R (Cyrillic)       KOI8-R
       UTF-8              UTF-8           PC Kanji (SJIS)         PCK
       UTF-8              UTF-8           PC Kanji (SJIS)         SJIS
       UTF-8              UTF-8           UCS-2                   UCS-2
       UTF-8              UTF-8           UCS-4                   UCS-4
       UTF-8              UTF-8           UTF-16                  UTF-16
       UTF-8              UTF-8           UTF-7                   UTF-7
       UTF-8              UTF-8           Chinese/PRC EUC         zhCN.euc
                                          (GB 2312-1980)



                           CODE SET CONVERSIONS SUPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename
                               Element                              Element

       UTF-8                 UTF-8             ISO 2022-CN           zhCN.iso2022-7
       UTF-8                 UTF-8             Chinese/Taiwan Big5   zhTW-big5
       UTF-8                 UTF-8             Chinese/Taiwan  EUC   zhTW-euc
                                               (CNS 11643-1992)
       UTF-8                 UTF-8             ISO 2022-TW           zhTW-iso2022-7
       Chinese/PRC EUC       zhCN.euc         UTF-8                 UTF-8



SunOS 5.11          Last change: 18 Apr 1997                    2






Standards, Environments, and Macros              iconvunicode(5)



       (GB 2312-1980)
       ISO 2022-CN           zhCN.iso2022-7   UTF-8                 UTF-8
       Chinese/Taiwan Big5   zhTW-big5        UTF-8                 UTF-8
       Chinese/Taiwan  EUC   zhTW-euc         UTF-8                 UTF-8
       (CNS 11643-1992)
       ISO 2022-TW           zhTW-iso2022-7   UTF-8                 UTF-8



EXAMPLES
     Example 1 The library module filename


     In the conversion library, /usr/lib/iconv  (see  iconv(3C)),
     the library module filename is composed of two symbolic ele-
     ments separated by the percent sign (%).  The  first  symbol
     specifies  the  code set that is being converted; the second
     symbol specifies the target code, that is, the code  set  to
     which the first one is being converted.



     In the conversion table above, the first  symbol  is  termed
     the "FROM Filename Element". The second symbol, representing
     the target code set, is the "TO Filename Element".



     For example, the library module filename to convert from the
     Korean EUC code set to the Korean UTF-8 code set is



     koKR-euc%koKR-UTF-8


FILES
     /usr/lib/iconv/*.so    conversion modules


SEE ALSO
     iconv(1), iconv(3C), iconv(5)


     Chernov, A., Registration of a Cyrillic Character  Set,  RFC
     1489, RELCOM Development Team, July 1993.


     Chon, K., H. Je Park, and U. Choi, Korean Character Encoding
     for  Internet  Messages,  RFC  1557,  Solvit  Chosun  Media,
     December 1993.




SunOS 5.11          Last change: 18 Apr 1997                    3






Standards, Environments, and Macros              iconvunicode(5)



     Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transforma-
     tion Format of Unicode, RFC 1642, Taligent, Inc., July 1994.


     Lee, F., HZ - A Data Format for Exchanging  Files  of  Arbi-
     trarily  Mixed Chinese and ASCI characters, RFC 1843, Stan-
     ford University, August 1995.


     Murai, J., M. Crispin, and E. van der Poel, Japanese Charac-
     ter  Encoding  for Internet Messages, RFC 1468, Keio Univer-
     sity, Panda Programming, June 1993.


     Nussbacher, H., and Y. Bourvine, Hebrew  Character  Encoding
     for  Internet  Messages, RFC 1555, Israeli Inter-University,
     Hebrew University, December 1993.


     Ohta, M., Character Sets ISO-10646  and  ISO-10646-J-1,  RFC
     1815, Tokyo Institute of Technology, July 1995.


     Ohta, M., and K. Handa, ISO-2022-JP-2:  Multilingual  Exten-
     sion  of  ISO-2022-JP, RFC 1554, Tokyo Institute of Technol-
     ogy, December 1993.


     Reynolds, J., and J. Postel,  ASIGNED  NUMBERS,  RFC  1700,
     University   of   Southern  California/Information  Sciences
     Institute, October 1994.


     Simonson, K., Character  Mnemonics  &  Character  Sets,  RFC
     1345, Rationel Almen Planlaegning, June 1992.


     Spinellis, D., Greek Character Encoding for Electronic  Mail
     Messages, RFC 1947, SENA S.A., May 1996.


     The Unicode Consortium, The Unicode Standard,  Version  2.0,
     Addison Wesley Developers Press, July 1996.


     Wei, Y., Y. Zhang, J. Li,  J.  Ding,  and  Y.  Jiang,  ASCI
     Printable  Characters-Based  Chinese  Character Encoding for
     Internet Messages, RFC 1842, AsiaInfo Services Inc., Harvard
     University,  Rice University, University of Maryland, August
     1995.





SunOS 5.11          Last change: 18 Apr 1997                    4






Standards, Environments, and Macros              iconvunicode(5)



     Yergeau, F., UTF-8, a transformation format of  Unicode  and
     ISO 10646, RFC 2044, Alis Technologies, October 1996.


     Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and  M.  Crispin,
     Chinese  Character Encoding for Internet Messages, RFC 1922,
     Tsinghua University, China Information Technology Standardi-
     zation Technical Committee (CITS), Institute for Information
     Industry (I), University of Washington, March 1996.

NOTES
     ISO 8859 character sets using  Latin  alphabetic  characters
     are distinguished as follows:

     ISO 8859-1 (Latin 1)     For most West  European  languages,
                              including:



                              Albanian             Finnish               Italian
                              Catalan              French                Norwegian
                              Danish               German                Portuguese
                              Dutch                Galician              Spanish
                              English              Irish                 Swedish
                              Faeroese             Icelandic



     ISO 8859-2 (Latin 2)     For most Latin-written  Slavic  and
                              Central European languages:



                              Czech                Polish                Slovak
                              German               Rumanian              Slovene
                              Hungarian            Croatian



     ISO 8859-3 (Latin 3)     Popularly used for Esperanto, Gali-
                              cian, Maltese, and Turkish.


     ISO 8859-4 (Latin 4)     Introduces  letters  for  Estonian,
                              Latvian,  and  Lithuanian. It is an
                              incomplete   predecessor   of   ISO
                              8859-10 (Latin 6).


     ISO 8859-9 (Latin 5)     Replaces  the  rarely  needed  Ice-
                              landic letters in ISO 8859-1 (Latin
                              1) with the Turkish ones.



SunOS 5.11          Last change: 18 Apr 1997                    5






Standards, Environments, and Macros              iconvunicode(5)



     ISO 8859-10 (Latin 6)    Adds the last  Inuit  (Greenlandic)
                              and  Sami  (Lappish)  letters  that
                              were not  included  in  ISO  8859-4
                              (Latin  4)  to complete coverage of
                              the Nordic area.


















































SunOS 5.11          Last change: 18 Apr 1997                    6



OpenSolaris man pages main menu

Contact us      |       About us      |       Term of use      |       Copyright © 2000-2010 MyWebUniversity.com ™