Standard C Library Functions u8textprepstr(3C)
NAME
u8textprepstr - string-based UTF-8 text preparation func-
tion
SYNOPSIS
#include
sizet u8textprepstr(char *inarray, sizet *inlen,
char *outarray, sizet *outlen, int flag,
sizet unicodeversion, int *errnum);
PARAMETERS
inarray A pointer to a byte array containing a
sequence of UTF-8 character bytes to be
prepared.
inlen As input argument, the number of bytes
to be prepared in inarray. As output
argument, the number of bytes in inarray
still not consumed.
outarray A pointer to a byte array where prepared
UTF-8 character bytes can be saved.
outlen As input argument, the number of avail-
able bytes at outarray where prepared
character bytes can be saved. As output
argument, after the conversion, the
number of bytes still available at
outarray.
flag The possible preparation options con-
structed by a bitwise-inclusive-OR of
the following values:
U8TEXTPREPIGNORENUL
Normally u8textprepstr() stops the
preparation if it encounters null
byte even if the current inlen is
pointing to a value bigger than
zero.
With this option, null byte does not
stop the preparation and the
preparation continues until inlen
specified amount of inarray bytes
SunOS 5.11 Last change: 18 Sep 2007 1
Standard C Library Functions u8textprepstr(3C)
are all consumed for preparation or
an error happened.
U8TEXTPREPIGNOREINVALID
Normally u8textprepstr() stops the
preparation if it encounters illegal
or incomplete characters with
corresponding errnum values.
When this option is set,
u8textprepstr() does not stop the
preparation and instead treats such
characters as no need to do any
preparation.
U8TEXTPREPTOUPER
Map lowercase characters to upper-
case characters if applicable.
U8TEXTPREPTOLOWER
Map uppercase characters to lower-
case characters if applicable.
U8TEXTPREPNFD
Apply Unicode Normalization Form D.
U8TEXTPREPNFC
Apply Unicode Normalization Form C.
U8TEXTPREPNFKD
Apply Unicode Normalization Form KD.
U8TEXTPREPNFKC
Apply Unicode Normalization Form KC.
Only one case folding option is allowed.
Only one Unicode Normalization option is
allowed.
SunOS 5.11 Last change: 18 Sep 2007 2
Standard C Library Functions u8textprepstr(3C)
When a case folding option and a Unicode
Normalization option are specified
together, UTF-8 text preparation is done
by doing case folding first and then
Unicode Normalization.
If no option is specified, no processing
occurs except the simple copying of
bytes from input to output.
unicodeversion The version of Unicode data that should
be used during UTF-8 text preparation.
The following values are supported:
U8UNICODE320
Use Unicode 3.2.0 data during com-
parison.
U8UNICODE500
Use Unicode 5.0.0 data during com-
parison.
U8UNICODELATEST
Use the latest Unicode version data
available which is Unicode 5.0.0
currently.
errnum The error value when preparation is not
completed or fails. The following values
are supported:
E2BIG Text preparation stopped due
to lack of space in the output
array.
EBADF Specified option values are
conflicting and cannot be sup-
ported.
EILSEQ Text preparation stopped due
to an input byte that does not
belong to UTF-8.
SunOS 5.11 Last change: 18 Sep 2007 3
Standard C Library Functions u8textprepstr(3C)
EINVAL Text preparation stopped due
to an incomplete UTF-8 charac-
ter at the end of the input
array.
ERANGE The specified Unicode version
value is not a supported ver-
sion.
DESCRIPTION
The u8textprepstr() function prepares the sequence of
UTF-8 characters in the array specified by inarray into a
sequence of corresponding UTF-8 characters prepared in the
array specified by outarray. The inarray argument points to
a character byte array to the first character in the input
array and inlen indicates the number of bytes to the end of
the array to be converted. The outarray argument points to a
character byte array to the first available byte in the out-
put array and outlen indicates the number of the available
bytes to the end of the array. Unless flag is
U8TEXTPREPIGNORENUL, u8textprepstr() normally stops
when it encounters a null byte from the input array regard-
less of the current inlen value.
If flag is U8TEXTPREPIGNOREINVALID and a sequence of
input bytes does not form a valid UTF-8 character, prepara-
tion stops after the previous successfully prepared charac-
ter. If flag is U8TEXTPREPIGNOREINVALID and the input
array ends with an incomplete UTF-8 character, preparation
stops after the previous successfully prepared bytes. If the
output array is not large enough to hold the entire prepared
text, preparation stops just prior to the input bytes that
would cause the output array to overflow. The value pointed
to by inlen is decremented to reflect the number of bytes
still not prepared in the input array. The value pointed to
by outlen is decremented to reflect the number of bytes
still available in the output array.
RETURN VALUES
The u8textprepstr() function updates the values pointed to
by inlen and outlen arguments to reflect the extent of the
preparation. When U8TEXTPREPIGNOREINVALID is specified,
u8textprepstr() returns the number of illegal or incom-
plete characters found during the text preparation. When
U8TEXTPREPIGNOREINVALID is not specified and the text
preparation is entirely successful, the function returns 0.
If the entire string in the input array is prepared, the
value pointed to by inlen will be 0. If the text preparation
SunOS 5.11 Last change: 18 Sep 2007 4
Standard C Library Functions u8textprepstr(3C)
is stopped due to any conditions mentioned above, the value
pointed to by inlen will be non-zero and errnum is set to
indicate the error. If such and any other error occurs,
u8textprepstr() returns (sizet)-1 and sets errnum to
indicate the error.
EXAMPLES
Example 1 Simple UTF-8 text preparation
#include
.
.
.
sizet ret;
char ib[MAXPATHLEN];
char ob[MAXPATHLEN];
sizet il, ol;
int err;
.
.
.
/*
* We got a UTF-8 pathname from somewhere.
*
* Calculate the length of input string including the terminating
* NUL byte and prepare other arguments.
*/
(void) strlcpy(ib, pathname, MAXPATHLEN);
il = strlen(ib) ] 1;
ol = MAXPATHLEN;
/*
* Do toupper case folding, apply Unicode Normalization Form D,
* ignore NUL byte, and ignore any illegal/incomplete characters.
*/
ret = u8textprepstr(ib, &il, ob, &ol,
(U8TEXTPREPIGNORENULU8TEXTPREPIGNOREINVALID
U8TEXTPREPTOUPERU8TEXTPREPNFD), U8UNICODELATEST, &err);
if (ret == (sizet)-1) {
if (err == E2BIG)
return (-1);
if (err == EBADF)
return (-2);
if (err == ERANGE)
return (-3);
return (-4);
}
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
SunOS 5.11 Last change: 18 Sep 2007 5
Standard C Library Functions u8textprepstr(3C)
ATRIBUTE TYPE ATRIBUTE VALUE
Interface Stability Committed
MT-Level MT-Safe
SEE ALSO
u8strcmp(3C), u8validate(3C), attributes(5),
u8strcmp(9F), u8textprepstr(9F), u8validate(9F)
The Unicode Standard (http:/www.unicode.org)
NOTES
After the text preparation, the number of prepared UTF-8
characters and the total number bytes may decrease or
increase when you compare the numbers with the input buffer.
Case conversions are performed using Unicode data of the
corresponding version. There are no locale-specific case
conversions that can be performed.
SunOS 5.11 Last change: 18 Sep 2007 6
|