This API is used to convert codepage or character encoded data to and 28 * from UTF-16. You can open a converter with {@link ucnv_open() }. With that 29 * converter, you can get its properties, set options, convert your data and 30 * close the converter.
Since many software programs recognize different converter names for 33 * different types of converters, there are other functions in this API to 34 * iterate over the converter aliases. The functions {@link ucnv_getAvailableName() }, 35 * {@link ucnv_getAlias() } and {@link ucnv_getStandardName() } are some of the 36 * more frequently used alias functions to get this information.
When a converter encounters an illegal, irregular, invalid or unmappable character 39 * its default behavior is to use a substitution character to replace the 40 * bad byte sequence. This behavior can be changed by using {@link ucnv_setFromUCallBack() } 41 * or {@link ucnv_setToUCallBack() } on the converter. The header ucnv_err.h defines 42 * many other callback actions that can be used instead of a character substitution.
More information about this API can be found in our 45 * User Guide.
NULL
A converter name for ICU 1.5 and above may contain options 328 * like a locale specification to control the specific behavior of 329 * the newly instantiated converter. 330 * The meaning of the options depends on the particular converter. 331 * If an option is not defined for or recognized by a given converter, then it is ignored.
Options are appended to the converter name string, with a 334 * UCNV_OPTION_SEP_CHAR between the name and the first option and 335 * also between adjacent options.
UCNV_OPTION_SEP_CHAR
If the alias is ambiguous, then the preferred converter is used 338 * and the status is set to U_AMBIGUOUS_ALIAS_WARNING.
The conversion behavior and names can vary between platforms. ICU may 341 * convert some characters differently from other platforms. Details on this topic 342 * are in the User 343 * Guide. Aliases starting with a "cp" prefix have no specific meaning 344 * other than its an alias starting with the letters "cp". Please do not 345 * associate any meaning to these aliases.
See ucnv_open for the complete details
Creates a UConverter object specified from a packageName and a converterName.
The packageName and converterName must point to an ICU udata object, as defined by 474 * udata_open( packageName, "cnv", converterName, err) or equivalent. 475 * Typically, packageName will refer to a (.dat) file, or to a package registered with 476 * udata_setAppData(). Using a full file or directory pathname for packageName is deprecated.
udata_open( packageName, "cnv", converterName, err)
The name will NOT be looked up in the alias mechanism, nor will the converter be 479 * stored in the converter cache or the alias table. The only way to open further converters 480 * is call this function multiple times, or use the ucnv_clone() function to clone a 481 * 'primary' converter.
A future version of ICU may add alias table lookups and/or caching 484 * to this function.
Example Use: 487 * cnv = ucnv_openPackage("myapp", "myconverter", &err); 488 *
cnv = ucnv_openPackage("myapp", "myconverter", &err);
U_BUFFER_OVERFLOW_ERROR
Handling of surrogate pairs and supplementary-plane code points: 1336 * There are two different kinds of codepages that provide mappings for surrogate characters: 1337 *
U_INDEX_OUTOFBOUNDS_ERROR
ucnv_countAliases()
const char *
ucnv_getStandardName
uenum_close
1832 * Example alias table: 1833 * conv alias1 { STANDARD1 } alias2 { STANDARD1* } 1834 *
1835 * Result of ucnv_getStandardName("conv", "STANDARD1") from example 1836 * alias table: 1837 * "alias2" 1838 * 1839 * @param name original converter name 1840 * @param standard name of the standard governing the names; MIME and IANA 1841 * are such standards 1842 * @param pErrorCode result of operation 1843 * @return returns the standard converter name; 1844 * if a standard converter name cannot be determined, 1845 * then NULL is returned. Owned by the library. 1846 * @stable ICU 2.0 1847 */ 1848 U_CAPI const char * U_EXPORT2 1849 ucnv_getStandardName(const char *name, const char *standard, UErrorCode *pErrorCode); 1850 1851 /** 1852 * This function will return the internal canonical converter name of the 1853 * tagged alias. This is the opposite of ucnv_openStandardNames, which 1854 * returns the tagged alias given the canonical name. 1855 *
1856 * Example alias table: 1857 * conv alias1 { STANDARD1 } alias2 { STANDARD1* } 1858 *
1859 * Result of ucnv_getStandardName("alias1", "STANDARD1") from example 1860 * alias table: 1861 * "conv" 1862 * 1863 * @return returns the canonical converter name; 1864 * if a standard or alias name cannot be determined, 1865 * then NULL is returned. The returned string is 1866 * owned by the library. 1867 * @see ucnv_getStandardName 1868 * @stable ICU 2.4 1869 */ 1870 U_CAPI const char * U_EXPORT2 1871 ucnv_getCanonicalName(const char *alias, const char *standard, UErrorCode *pErrorCode); 1872 1873 /** 1874 * Returns the current default converter name. If you want to open 1875 * a default converter, you do not need to use this function. 1876 * It is faster if you pass a NULL argument to ucnv_open the 1877 * default converter. 1878 * 1879 * If U_CHARSET_IS_UTF8 is defined to 1 in utypes.h then this function 1880 * always returns "UTF-8". 1881 * 1882 * @return returns the current default converter name. 1883 * Storage owned by the library 1884 * @see ucnv_setDefaultName 1885 * @stable ICU 2.0 1886 */ 1887 U_CAPI const char * U_EXPORT2 1888 ucnv_getDefaultName(void); 1889 1890 #ifndef U_HIDE_SYSTEM_API 1891 /** 1892 * This function is not thread safe. DO NOT call this function when ANY ICU 1893 * function is being used from more than one thread! This function sets the 1894 * current default converter name. If this function needs to be called, it 1895 * should be called during application initialization. Most of the time, the 1896 * results from ucnv_getDefaultName() or ucnv_open with a NULL string argument 1897 * is sufficient for your application. 1898 * 1899 * If U_CHARSET_IS_UTF8 is defined to 1 in utypes.h then this function 1900 * does nothing. 1901 * 1902 * @param name the converter name to be the default (must be known by ICU). 1903 * @see ucnv_getDefaultName 1904 * @system 1905 * @stable ICU 2.0 1906 */ 1907 U_CAPI void U_EXPORT2 1908 ucnv_setDefaultName(const char *name); 1909 #endif /* U_HIDE_SYSTEM_API */ 1910 1911 /** 1912 * Fixes the backslash character mismapping. For example, in SJIS, the backslash 1913 * character in the ASCII portion is also used to represent the yen currency sign. 1914 * When mapping from Unicode character 0x005C, it's unclear whether to map the 1915 * character back to yen or backslash in SJIS. This function will take the input 1916 * buffer and replace all the yen sign characters with backslash. This is necessary 1917 * when the user tries to open a file with the input buffer on Windows. 1918 * This function will test the converter to see whether such mapping is 1919 * required. You can sometimes avoid using this function by using the correct version 1920 * of Shift-JIS. 1921 * 1922 * @param cnv The converter representing the target codepage. 1923 * @param source the input buffer to be fixed 1924 * @param sourceLen the length of the input buffer 1925 * @see ucnv_isAmbiguous 1926 * @stable ICU 2.0 1927 */ 1928 U_CAPI void U_EXPORT2 1929 ucnv_fixFileSeparator(const UConverter *cnv, UChar *source, int32_t sourceLen); 1930 1931 /** 1932 * Determines if the converter contains ambiguous mappings of the same 1933 * character or not. 1934 * @param cnv the converter to be tested 1935 * @return true if the converter contains ambiguous mapping of the same 1936 * character, false otherwise. 1937 * @stable ICU 2.0 1938 */ 1939 U_CAPI UBool U_EXPORT2 1940 ucnv_isAmbiguous(const UConverter *cnv); 1941 1942 /** 1943 * Sets the converter to use fallback mappings or not. 1944 * Regardless of this flag, the converter will always use 1945 * fallbacks from Unicode Private Use code points, as well as 1946 * reverse fallbacks (to Unicode). 1947 * For details see ".ucm File Format" 1948 * in the Conversion Data chapter of the ICU User Guide: 1949 * https://unicode-org.github.io/icu/userguide/conversion/data.html#ucm-file-format 1950 * 1951 * @param cnv The converter to set the fallback mapping usage on. 1952 * @param usesFallback true if the user wants the converter to take advantage of the fallback 1953 * mapping, false otherwise. 1954 * @stable ICU 2.0 1955 * @see ucnv_usesFallback 1956 */ 1957 U_CAPI void U_EXPORT2 1958 ucnv_setFallback(UConverter *cnv, UBool usesFallback); 1959 1960 /** 1961 * Determines if the converter uses fallback mappings or not. 1962 * This flag has restrictions, see ucnv_setFallback(). 1963 * 1964 * @param cnv The converter to be tested 1965 * @return true if the converter uses fallback, false otherwise. 1966 * @stable ICU 2.0 1967 * @see ucnv_setFallback 1968 */ 1969 U_CAPI UBool U_EXPORT2 1970 ucnv_usesFallback(const UConverter *cnv); 1971 1972 /** 1973 * Detects Unicode signature byte sequences at the start of the byte stream 1974 * and returns the charset name of the indicated Unicode charset. 1975 * NULL is returned when no Unicode signature is recognized. 1976 * The number of bytes in the signature is output as well. 1977 * 1978 * The caller can ucnv_open() a converter using the charset name. 1979 * The first code unit (UChar) from the start of the stream will be U+FEFF 1980 * (the Unicode BOM/signature character) and can usually be ignored. 1981 * 1982 * For most Unicode charsets it is also possible to ignore the indicated 1983 * number of initial stream bytes and start converting after them. 1984 * However, there are stateful Unicode charsets (UTF-7 and BOCU-1) for which 1985 * this will not work. Therefore, it is best to ignore the first output UChar 1986 * instead of the input signature bytes. 1987 *
1988 * Usage: 1989 * \snippet samples/ucnv/convsamp.cpp ucnv_detectUnicodeSignature 1990 * 1991 * @param source The source string in which the signature should be detected. 1992 * @param sourceLength Length of the input string, or -1 if terminated with a NUL byte. 1993 * @param signatureLength A pointer to int32_t to receive the number of bytes that make up the signature 1994 * of the detected UTF. 0 if not detected. 1995 * Can be a NULL pointer. 1996 * @param pErrorCode ICU error code in/out parameter. 1997 * Must fulfill U_SUCCESS before the function call. 1998 * @return The name of the encoding detected. NULL if encoding is not detected. 1999 * @stable ICU 2.4 2000 */ 2001 U_CAPI const char* U_EXPORT2 2002 ucnv_detectUnicodeSignature(const char* source, 2003 int32_t sourceLength, 2004 int32_t *signatureLength, 2005 UErrorCode *pErrorCode); 2006 2007 /** 2008 * Returns the number of UChars held in the converter's internal state 2009 * because more input is needed for completing the conversion. This function is 2010 * useful for mapping semantics of ICU's converter interface to those of iconv, 2011 * and this information is not needed for normal conversion. 2012 * @param cnv The converter in which the input is held 2013 * @param status ICU error code in/out parameter. 2014 * Must fulfill U_SUCCESS before the function call. 2015 * @return The number of UChars in the state. -1 if an error is encountered. 2016 * @stable ICU 3.4 2017 */ 2018 U_CAPI int32_t U_EXPORT2 2019 ucnv_fromUCountPending(const UConverter* cnv, UErrorCode* status); 2020 2021 /** 2022 * Returns the number of chars held in the converter's internal state 2023 * because more input is needed for completing the conversion. This function is 2024 * useful for mapping semantics of ICU's converter interface to those of iconv, 2025 * and this information is not needed for normal conversion. 2026 * @param cnv The converter in which the input is held as internal state 2027 * @param status ICU error code in/out parameter. 2028 * Must fulfill U_SUCCESS before the function call. 2029 * @return The number of chars in the state. -1 if an error is encountered. 2030 * @stable ICU 3.4 2031 */ 2032 U_CAPI int32_t U_EXPORT2 2033 ucnv_toUCountPending(const UConverter* cnv, UErrorCode* status); 2034 2035 /** 2036 * Returns whether or not the charset of the converter has a fixed number of bytes 2037 * per charset character. 2038 * An example of this are converters that are of the type UCNV_SBCS or UCNV_DBCS. 2039 * Another example is UTF-32 which is always 4 bytes per character. 2040 * A Unicode code point may be represented by more than one UTF-8 or UTF-16 code unit 2041 * but a UTF-32 converter encodes each code point with 4 bytes. 2042 * Note: This method is not intended to be used to determine whether the charset has a 2043 * fixed ratio of bytes to Unicode codes units for any particular Unicode encoding form. 2044 * false is returned with the UErrorCode if error occurs or cnv is NULL. 2045 * @param cnv The converter to be tested 2046 * @param status ICU error code in/out parameter 2047 * @return true if the converter is fixed-width 2048 * @stable ICU 4.8 2049 */ 2050 U_CAPI UBool U_EXPORT2 2051 ucnv_isFixedWidth(UConverter *cnv, UErrorCode *status); 2052 2053 #endif 2054 2055 #endif 2056 /*_UCNV*/