Standard C Library Functions uconvu16tou32(3C)
NAME
uconvu16tou32, uconvu16tou8, uconvu32tou16,
uconvu32tou8, uconvu8tou16, uconvu8tou32 - Unicode encod-
ing conversion functions
SYNOPSIS
#include
#include
#include
int uconvu16tou32(const uint16t *utf16str, sizet *utf16len,
uint32t *utf32str, sizet *utf32len, int flag);
int uconvu16tou8(const uint16t *utf16str, sizet *utf16len,
uchart *utf8str, sizet *utf8len, int flag);
int uconvu32tou16(const uint32t *utf32str, sizet *utf32len,
uint16t *utf16str, sizet *utf16len, int flag);
int uconvu32tou8(const uint32t *utf32str, sizet *utf32len,
uchart *utf8str, sizet *utf8len, int flag);
int uconvu8tou16(const uchart *utf8str, sizet *utf8len,
uint16t *utf16str, sizet *utf16len, int flag);
int uconvu8tou32(const uchart *utf8str, sizet *utf8len,
uint32t *utf32str, sizet *utf32len, int flag);
PARAMETERS
utf16str A pointer to a UTF-16 character string.
utf16len As an input parameter, the number of 16-bit
unsigned integers in utf16str as UTF-16 charac-
ters to be converted or saved.
As an output parameter, the number of 16-bit
unsigned integers in utf16str consumed or saved
during conversion.
utf32str A pointer to a UTF-32 character string.
utf32len As an input parameter, the number of 32-bit
unsigned integers in utf32str as UTF-32
SunOS 5.11 Last change: 18 Sep 2007 1
Standard C Library Functions uconvu16tou32(3C)
characters to be converted or saved.
As an output parameter, the number of 32-bit
unsigned integers in utf32str consumed or saved
during conversion.
utf8str A pointer to a UTF-8 character string.
utf8len As an input parameter, the number of bytes in
utf8str as UTF-8 characters to be converted or
saved.
As an output parameter, the number of bytes in
utf8str consumed or saved during conversion.
flag The possible conversion options that are con-
structed by a bitwise-inclusive-OR of the fol-
lowing values:
UCONVINBIGENDIAN
The input parameter is in big endian byte
ordering.
UCONVOUTBIGENDIAN
The output parameter should be in big endian
byte ordering.
UCONVINSYSTEMENDIAN
The input parameter is in the default byte
ordering of the current system.
UCONVOUTSYSTEMENDIAN
The output parameter should be in the
default byte ordering of the current system.
UCONVINLITLENDIAN
The input parameter is in little endian byte
ordering.
SunOS 5.11 Last change: 18 Sep 2007 2
Standard C Library Functions uconvu16tou32(3C)
UCONVOUTLITLENDIAN
The output parameter should be in little
endian byte ordering.
UCONVIGNORENUL
The null or U]0000 character should not stop
the conversion.
UCONVINACEPTBOM
If the Byte Order Mark (BOM, U]FEF) charac-
ter exists as the first character of the
input parameter, interpret it as the BOM
character.
UCONVOUTEMITBOM
Start the output parameter with Byte Order
Mark (BOM, U]FEF) character to indicate the
byte ordering if the output parameter is in
UTF-16 or UTF-32.
DESCRIPTION
The uconvu16tou32() function reads the given utf16str in
UTF-16 until U]0000 (zero) in utf16str is encountered as a
character or until the number of 16-bit unsigned integers
specified in utf16len is read. The UTF-16 characters that
are read are converted into UTF-32 and the result is saved
at utf32str. After the successful conversion, utf32len con-
tains the number of 32-bit unsigned integers saved at
utf32str as UTF-32 characters.
The uconvu16tou8() function reads the given utf16str in
UTF-16 until U]0000 (zero) in utf16str is encountered as a
character or until the number of 16-bit unsigned integers
specified in utf16len is read. The UTF-16 characters that
are read are converted into UTF-8 and the result is saved at
utf8str. After the successful conversion, utf8len contains
the number of bytes saved at utf8str as UTF-8 characters.
The uconvu32tou16() function reads the given utf32str in
UTF-32 until U]0000 (zero) in utf32str is encountered as a
character or until the number of 32-bit unsigned integers
SunOS 5.11 Last change: 18 Sep 2007 3
Standard C Library Functions uconvu16tou32(3C)
specified in utf32len is read. The UTF-32 characters that
are read are converted into UTF-16 and the result is saved
at utf16str. After the successful conversion, utf16len con-
tains the number of 16-bit unsigned integers saved at
utf16str as UTF-16 characters.
The uconvu32tou8() function reads the given utf32str in
UTF-32 until U]0000 (zero) in utf32str is encountered as a
character or until the number of 32-bit unsigned integers
specified in utf32len is read. The UTF-32 characters that
are read are converted into UTF-8 and the result is saved at
utf8str. After the successful conversion, utf8len contains
the number of bytes saved at utf8str as UTF-8 characters.
The uconvu8tou16() function reads the given utf8str in
UTF-8 until the null ('\0') byte in utf8str is encountered
or until the number of bytes specified in utf8len is read.
The UTF-8 characters that are read are converted into UTF-16
and the result is saved at utf16str. After the successful
conversion, utf16len contains the number of 16-bit unsigned
integers saved at utf16str as UTF-16 characters.
The uconvu8tou32() function reads the given utf8str in
UTF-8 until the null ('\0') byte in utf8str is encountered
or until the number of bytes specified in utf8len is read.
The UTF-8 characters that are read are converted into UTF-32
and the result is saved at utf32str. After the successful
conversion, utf32len contains the number of 32-bit unsigned
integers saved at utf32str as UTF-32 characters.
During the conversion, the input and the output parameters
are treated with byte orderings specified in the flag param-
eter. When not specified, the default byte ordering of the
system is used. The byte ordering flag value that is speci-
fied for UTF-8 is ignored.
When UCONVINACEPTBOM is specified as the flag and the
first character of the string pointed to by the input param-
eter is the BOM character, the value of the BOM character
dictates the byte ordering of the subsequent characters in
the string pointed to by the input parameter, regardless of
the supplied input parameter byte ordering option flag
values. If the UCONVINACEPTBOM is not specified, the BOM
as the first character is treated as a regular Unicode char-
acter: Zero Width No Break Space (ZWNBSP) character.
SunOS 5.11 Last change: 18 Sep 2007 4
Standard C Library Functions uconvu16tou32(3C)
When UCONVIGNORENUL is specified, regardless of whether
the input parameter contains U]0000 or null byte, the
conversion continues until the specified number of input
parameter elements at utf16len, utf32len, or utf8len are
entirely consumed during the conversion.
As output parameters, utf16len, utf32len, and utf8len are
not changed if conversion fails for any reason.
RETURN VALUES
Upon successful conversion, the functions return 0. Upon
failure, the functions return one of the following errno
values:
EILSEQ The conversion detected an illegal or out of bound
character value in the input parameter.
E2BIG The conversion cannot finish because the size
specified in the output parameter is too small.
EINVAL The conversion stops due to an incomplete charac-
ter at the end of the input string.
EBADF Conflicting byte-ordering option flag values are
detected.
EXAMPLES
Example 1 Convert a UTF-16 string in little-endian byte ord-
ering into UTF-8 string.
#include
#include
#include
.
.
.
uint16t u16s[MAXNAMELEN ] 1];
uchart u8s[MAXNAMELEN ] 1];
sizet u16len, u8len;
int ret;
.
.
.
u16len = u8len = MAXNAMELEN;
ret = uconvu16tou8(u16s, &u16len, u8s, &u8len,
UCONVINLITLENDIAN);
if (ret != 0) {
SunOS 5.11 Last change: 18 Sep 2007 5
Standard C Library Functions uconvu16tou32(3C)
/* Conversion error occurred. */
return (ret);
}
.
.
.
Example 2 Convert a UTF-32 string in big endian byte order-
ing into little endian UTF-16.
#include
#include
#include
.
.
.
/*
* An UTF-32 character can be mapped to an UTF-16 character with
* two 16-bit integer entities as a "surrogate pair."
*/
uint32t u32s[101];
uint16t u16s[101];
int ret;
sizet u32len, u16len;
.
.
.
u32len = u16len = 100;
ret = uconvu32tou16(u32s, &u32len, u16s, &u16len,
UCONVINBIGENDIAN UCONVOUTLITLENDIAN);
if (ret == 0) {
return (0);
} else if (ret == E2BIG) {
/* Use bigger output parameter and try just one more time. */
uint16t u16s2[201];
u16len = 200;
ret = uconvu32tou16(u32s, &u32len, u16s2, &u16len,
UCONVINBIGENDIAN UCONVOUTLITLENDIAN);
if (ret == 0)
return (0);
}
/* Otherwise, return -1 to indicate an error condition. */
return (-1);
Example 3 Convert a UTF-8 string into UTF-16 in little-
endian byte ordering.
SunOS 5.11 Last change: 18 Sep 2007 6
Standard C Library Functions uconvu16tou32(3C)
Convert a UTF-8 string into UTF-16 in little-endian byte
ordering with a Byte Order Mark (BOM) character at the
beginning of the output parameter.
#include
#include
#include
.
.
.
uchart u8s[MAXNAMELEN ] 1];
uint16t u16s[MAXNAMELEN ] 1];
sizet u8len, u16len;
int ret;
.
.
.
u8len = u16len = MAXNAMELEN;
ret = uconvu8tou16(u8s, &u8len, u16s, &u16len,
UCONVINLITLENDIAN UCONVEMITBOM);
if (ret != 0) {
/* Conversion error occurred. */
return (ret);
}
.
.
.
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
ATRIBUTE TYPE ATRIBUTE VALUE
Interface Stability Committed
MT-Level MT-Safe
SEE ALSO
attributes(5), uconvu16tou32(9F)
The Unicode Standard (http:/www.unicode.org)
SunOS 5.11 Last change: 18 Sep 2007 7
Standard C Library Functions uconvu16tou32(3C)
NOTES
Each UTF-16 or UTF-32 character maps to an UTF-8 character
that might need one to maximum of four bytes.
One UTF-32 or UTF-8 character can yield two 16-bit unsigned
integers as a UTF-16 character, which is a surrogate pair if
the Unicode scalar value is bigger than U]F.
Ill-formed UTF-16 surrogate pairs are seen as illegal char-
acters during the conversion.
SunOS 5.11 Last change: 18 Sep 2007 8
|