MyWebUniversity.com Home Page
 



OpenSolaris man pages main menu


Standard C Library Functions                  u8textprepstr(3C)



NAME
     u8textprepstr - string-based UTF-8 text preparation  func-
     tion

SYNOPSIS
     #include 

     sizet u8textprepstr(char *inarray, sizet *inlen,
          char *outarray, sizet *outlen, int flag,
          sizet unicodeversion, int *errnum);


PARAMETERS
     inarray             A pointer to a byte array  containing  a
                         sequence  of UTF-8 character bytes to be
                         prepared.


     inlen               As input argument, the number  of  bytes
                         to  be  prepared  in  inarray. As output
                         argument, the number of bytes in inarray
                         still not consumed.


     outarray            A pointer to a byte array where prepared
                         UTF-8 character bytes can be saved.


     outlen              As input argument, the number of  avail-
                         able  bytes  at  outarray where prepared
                         character bytes can be saved.  As output
                         argument,   after  the  conversion,  the
                         number  of  bytes  still  available   at
                         outarray.


     flag                The possible  preparation  options  con-
                         structed  by  a  bitwise-inclusive-OR of
                         the following values:

                         U8TEXTPREPIGNORENUL

                             Normally u8textprepstr() stops the
                             preparation  if  it  encounters null
                             byte even if the  current  inlen  is
                             pointing  to  a  value  bigger  than
                             zero.

                             With this option, null byte does not
                             stop   the   preparation   and   the
                             preparation  continues  until  inlen
                             specified  amount  of  inarray bytes



SunOS 5.11          Last change: 18 Sep 2007                    1






Standard C Library Functions                  u8textprepstr(3C)



                             are all consumed for preparation  or
                             an error happened.


                         U8TEXTPREPIGNOREINVALID

                             Normally u8textprepstr() stops the
                             preparation if it encounters illegal
                             or   incomplete   characters    with
                             corresponding errnum values.

                             When    this    option    is    set,
                             u8textprepstr()  does not stop the
                             preparation and instead treats  such
                             characters  as  no  need  to  do any
                             preparation.


                         U8TEXTPREPTOUPER

                             Map lowercase characters  to  upper-
                             case characters if applicable.


                         U8TEXTPREPTOLOWER

                             Map uppercase characters  to  lower-
                             case characters if applicable.


                         U8TEXTPREPNFD

                             Apply Unicode Normalization Form D.


                         U8TEXTPREPNFC

                             Apply Unicode Normalization Form C.


                         U8TEXTPREPNFKD

                             Apply Unicode Normalization Form KD.


                         U8TEXTPREPNFKC

                             Apply Unicode Normalization Form KC.

                         Only one case folding option is allowed.
                         Only one Unicode Normalization option is
                         allowed.



SunOS 5.11          Last change: 18 Sep 2007                    2






Standard C Library Functions                  u8textprepstr(3C)



                         When a case folding option and a Unicode
                         Normalization   option   are   specified
                         together, UTF-8 text preparation is done
                         by  doing  case  folding  first and then
                         Unicode Normalization.

                         If no option is specified, no processing
                         occurs  except  the  simple  copying  of
                         bytes from input to output.


     unicodeversion     The version of Unicode data that  should
                         be  used  during UTF-8 text preparation.
                         The following values are supported:

                         U8UNICODE320

                             Use Unicode 3.2.0 data  during  com-
                             parison.


                         U8UNICODE500

                             Use Unicode 5.0.0 data  during  com-
                             parison.


                         U8UNICODELATEST

                             Use the latest Unicode version  data
                             available  which  is  Unicode  5.0.0
                             currently.



     errnum              The error value when preparation is  not
                         completed or fails. The following values
                         are supported:

                         E2BIG     Text preparation  stopped  due
                                   to lack of space in the output
                                   array.


                         EBADF     Specified  option  values  are
                                   conflicting and cannot be sup-
                                   ported.


                         EILSEQ    Text preparation  stopped  due
                                   to an input byte that does not
                                   belong to UTF-8.



SunOS 5.11          Last change: 18 Sep 2007                    3






Standard C Library Functions                  u8textprepstr(3C)



                         EINVAL    Text preparation  stopped  due
                                   to an incomplete UTF-8 charac-
                                   ter at the end  of  the  input
                                   array.


                         ERANGE    The specified Unicode  version
                                   value  is not a supported ver-
                                   sion.



DESCRIPTION
     The u8textprepstr()  function  prepares  the  sequence  of
     UTF-8  characters  in  the array specified by inarray into a
     sequence of corresponding UTF-8 characters prepared  in  the
     array  specified by outarray. The inarray argument points to
     a character byte array to the first character in  the  input
     array  and inlen indicates the number of bytes to the end of
     the array to be converted. The outarray argument points to a
     character byte array to the first available byte in the out-
     put array and outlen indicates the number of  the  available
     bytes   to   the   end   of   the   array.  Unless  flag  is
     U8TEXTPREPIGNORENUL,  u8textprepstr()  normally  stops
     when  it encounters a null byte from the input array regard-
     less of the current inlen value.


     If flag is  U8TEXTPREPIGNOREINVALID  and  a  sequence  of
     input  bytes does not form a valid UTF-8 character, prepara-
     tion stops after the previous successfully prepared  charac-
     ter.  If  flag  is  U8TEXTPREPIGNOREINVALID and the input
     array ends with an incomplete UTF-8  character,  preparation
     stops after the previous successfully prepared bytes. If the
     output array is not large enough to hold the entire prepared
     text,  preparation  stops just prior to the input bytes that
     would cause the output array to overflow. The value  pointed
     to  by  inlen  is decremented to reflect the number of bytes
     still not prepared in the input array. The value pointed  to
     by  outlen  is  decremented  to  reflect the number of bytes
     still available in the output array.

RETURN VALUES
     The u8textprepstr() function updates the values pointed to
     by  inlen  and outlen arguments to reflect the extent of the
     preparation. When U8TEXTPREPIGNOREINVALID  is  specified,
     u8textprepstr()  returns  the  number of illegal or incom-
     plete characters found during  the  text  preparation.  When
     U8TEXTPREPIGNOREINVALID  is  not  specified  and the text
     preparation is entirely successful, the function returns  0.
     If  the  entire  string  in the input array is prepared, the
     value pointed to by inlen will be 0. If the text preparation



SunOS 5.11          Last change: 18 Sep 2007                    4






Standard C Library Functions                  u8textprepstr(3C)



     is  stopped due to any conditions mentioned above, the value
     pointed to by inlen will be non-zero and errnum  is  set  to
     indicate  the  error.  If  such  and any other error occurs,
     u8textprepstr() returns  (sizet)-1  and  sets  errnum  to
     indicate the error.

EXAMPLES
     Example 1 Simple UTF-8 text preparation

       #include 
       .
       .
       .
       sizet ret;
       char ib[MAXPATHLEN];
       char ob[MAXPATHLEN];
       sizet il, ol;
       int err;
       .
       .
       .
       /*
        * We got a UTF-8 pathname from somewhere.
        *
        * Calculate the length of input string including the terminating
        * NUL byte and prepare other arguments.
        */
       (void) strlcpy(ib, pathname, MAXPATHLEN);
       il = strlen(ib) ] 1;
       ol = MAXPATHLEN;

       /*
        * Do toupper case folding, apply Unicode Normalization Form D,
        * ignore NUL byte, and ignore any illegal/incomplete characters.
        */
       ret = u8textprepstr(ib, &il, ob, &ol,
           (U8TEXTPREPIGNORENULU8TEXTPREPIGNOREINVALID
           U8TEXTPREPTOUPERU8TEXTPREPNFD), U8UNICODELATEST, &err);
       if (ret == (sizet)-1) {
           if (err == E2BIG)
               return (-1);
           if (err == EBADF)
               return (-2);
           if (err == ERANGE)
               return (-3);
           return (-4);
       }


ATRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:



SunOS 5.11          Last change: 18 Sep 2007                    5






Standard C Library Functions                  u8textprepstr(3C)



     
           ATRIBUTE TYPE               ATRIBUTE VALUE       
    
     Interface Stability          Committed                   
    
     MT-Level                     MT-Safe                     
    


SEE ALSO
     u8strcmp(3C),        u8validate(3C),        attributes(5),
     u8strcmp(9F), u8textprepstr(9F), u8validate(9F)


     The Unicode Standard (http:/www.unicode.org)

NOTES
     After the text preparation, the  number  of  prepared  UTF-8
     characters  and  the  total  number  bytes  may  decrease or
     increase when you compare the numbers with the input buffer.


     Case conversions are performed using  Unicode  data  of  the
     corresponding  version.  There  are  no locale-specific case
     conversions that can be performed.






























SunOS 5.11          Last change: 18 Sep 2007                    6



OpenSolaris man pages main menu

Contact us      |       About us      |       Term of use      |       Copyright © 2000-2010 MyWebUniversity.com ™