MyWebUniversity.com Home Page
 



OpenSolaris man pages main menu


Introduction to Library Functions                    PCREPOSIX(3)



NAME
     PCRE - Perl-compatible regular expressions.

SYNOPSIS OF POSIX API

     #include 

     int regcomp(regext *preg, const char *pattern,
          int cflags);

     int regexec(regext *preg, const char *string,
          sizet nmatch, regmatcht pmatch[], int eflags);

     sizet regerror(int errcode, const regext *preg,
          char *errbuf, sizet errbufsize);

     void regfree(regext *preg);

DESCRIPTION

     This set of functions provides a POSIX-style API to the PCRE
     regular  expression  package.  See the pcreapi documentation
     for a description of PCRE's native API, which contains  much
     additional functionality.

     The functions described here are just wrapper functions that
     ultimately  call  the  PCRE native API. Their prototypes are
     defined in the pcreposix.h header file, and on Unix  systems
     the library itself is called pcreposix.a, so can be accessed
     by adding -lpcreposix to the command for linking an applica-
     tion  that  uses  them. Because the POSIX functions call the
     native ones, it is also necessary to add -lpcre.

     I have implemented only those option bits that can  be  rea-
     sonably  mapped  to  PCRE  native  options. In addition, the
     option REGEXTENDED is defined with the value zero. This has
     no  effect, but since programs that are written to the POSIX
     interface often use it, this makes it easier to slot in PCRE
     as  a  replacement library. Other POSIX options are not even
     defined.

     When PCRE is called via these functions, it is only the  API
     that is POSIX-like in style. The syntax and semantics of the
     regular expressions themselves are still those of Perl, sub-
     ject  to  the  setting of various PCRE options, as described
     below. "POSIX-like in style" means that the API approximates
     to  the  POSIX definition; it is not fully POSIX-compatible,
     and in multi-byte encoding domains it is probably even  less
     compatible.

     The header for these functions is supplied as pcreposix.h to
     avoid  any  potential  clash  with other POSIX libraries. It



SunOS 5.10                Last change:                          1






Introduction to Library Functions                    PCREPOSIX(3)



     can, of course, be renamed or aliased as regex.h,  which  is
     the "correct" name. It provides two structure types, regext
     for compiled internal forms, and  regmatcht  for  returning
     captured  substrings.  It  also defines some constants whose
     names start with "REG"; these are used for setting  options
     and identifying error codes.

COMPILING A PATERN

     The function regcomp() is called to compile a  pattern  into
     an  internal form. The pattern is a C string terminated by a
     binary zero, and is passed in the argument pattern. The preg
     argument is a pointer to a regext structure that is used as
     a base for storing information about  the  compiled  regular
     expression.

     The argument cflags is either zero, or contains one or  more
     of the bits defined by the following macros:

       REGDOTAL

     The PCREDOTAL option is set when the regular expression is
     passed  for  compilation  to  the native function. Note that
     REGDOTAL is not part of the POSIX standard.

       REGICASE

     The PCRECASELES option is set when the regular  expression
     is passed for compilation to the native function.

       REGNEWLINE

     The PCREMULTILINE option is set when the regular expression
     is  passed for compilation to the native function. Note that
     this  does  not  mimic  the  defined  POSIX  behaviour   for
     REGNEWLINE (see the following section).

       REGNOSUB

     The PCRENOAUTOCAPTURE option  is  set  when  the  regular
     expression is passed for compilation to the native function.
     In addition, when a pattern that is compiled with this  flag
     is  passed  to regexec() for matching, the nmatch and pmatch
     arguments are ignored, and no captured strings are returned.

       REGUTF8

     The PCREUTF8 option is set when the regular  expression  is
     passed  for  compilation to the native function. This causes
     the pattern itself and all data strings used for matching it
     to  be  treated  as UTF-8 strings. Note that REGUTF8 is not
     part of the POSIX standard.



SunOS 5.10                Last change:                          2






Introduction to Library Functions                    PCREPOSIX(3)



     In the absence of these flags, no options are passed to  the
     native  function.  This means the the regex is compiled with
     PCRE default semantics. In particular, the  way  it  handles
     newline  characters  in  the subject string is the Perl way,
     not the POSIX way. Note that setting PCREMULTILINE has only
     some  of  the effects specified for REGNEWLINE. It does not
     affect the way newlines are matched by . (they aren't) or by
     a negative class such as [^a] (they are).

     The yield of regcomp() is zero on success, and non-zero oth-
     erwise.  The preg structure is filled in on success, and one
     member of the structure  is  public:  rensub  contains  the
     number  of  capturing subpatterns in the regular expression.
     Various error codes are defined in the header file.

MATCHING NEWLINE CHARACTERS

     This area is not simple, because POSIX and  Perl  take  dif-
     ferent  views  of things.  It is not possible to get PCRE to
     obey POSIX semantics, but then PCRE was never intended to be
     a POSIX engine. The following table lists the different pos-
     sibilities for matching newline characters in PCRE:

                               Default   Change with

       . matches newline          no     PCREDOTAL
       newline matches [^a]       yes    not changeable
       $ matches \n at end        yes    PCREDOLARENDONLY
       $ matches \n in middle     no     PCREMULTILINE
       ^ matches \n in middle     no     PCREMULTILINE

     This is the equivalent table for POSIX:

                               Default   Change with

       . matches newline          yes    REGNEWLINE
       newline matches [^a]       yes    REGNEWLINE
       $ matches \n at end        no     REGNEWLINE
       $ matches \n in middle     no     REGNEWLINE
       ^ matches \n in middle     no     REGNEWLINE

     PCRE's behaviour is the same as Perl's, except that there is
     no  equivalent for PCREDOLARENDONLY in Perl. In both PCRE
     and Perl, there is no way  to  stop  newline  from  matching
     [^a].

     The default POSIX newline handling can be obtained  by  set-
     ting  PCREDOTAL  and  PCREDOLARENDONLY, but there is no
     way to make PCRE  behave  exactly  as  for  the  REGNEWLINE
     action.





SunOS 5.10                Last change:                          3






Introduction to Library Functions                    PCREPOSIX(3)



MATCHING A PATERN

     The function regexec() is called to match a compiled pattern
     preg  against a given string, which is by default terminated
     by a zero byte (but see REGSTARTEND below), subject to  the
     options in eflags. These can be:

       REGNOTBOL

     The PCRENOTBOL option is set when  calling  the  underlying
     PCRE matching function.

       REGNOTEOL

     The PCRENOTEOL option is set when  calling  the  underlying
     PCRE matching function.

       REGSTARTEND

     The  string   is   considered   to   start   at   string   ]
     pmatch[0].rmso  and  to  have  a terminating NUL located at
     string ] pmatch[0].rmeo (there need not actually be  a  NUL
     at  that  location), regardless of the value of nmatch. This
     is a BSD extension, compatible with  but  not  specified  by
     IE Standard 1003.2 (POSIX.2), and should be used with cau-
     tion in software intended to be portable to  other  systems.
     Note  that  a  non-zero  rmso  does  not  imply REGNOTBOL;
     REGSTARTEND affects only the location of  the  string,  not
     how it is matched.

     If the pattern was compiled with the REGNOSUB flag, no data
     about any matched strings is returned. The nmatch and pmatch
     arguments of regexec() are ignored.

     Otherwise,the portion of the string that  was  matched,  and
     also  any  captured  substrings, are returned via the pmatch
     argument, which points to an array of nmatch  structures  of
     type  regmatcht,  containing  the  members rmso and rmeo.
     These contain the offset to the first character of each sub-
     string  and  the offset to the first character after the end
     of each substring, respectively. The 0th element of the vec-
     tor  relates  to  the  entire  portion  of  string  that was
     matched; subsequent elements relate to the capturing subpat-
     terns of the regular expression. Unused entries in the array
     have both structure members set to -1.

     A successful match yields a zero return; various error codes
     are  defined in the header file, of which REGNOMATCH is the
     "expected" failure code.

EROR MESAGES




SunOS 5.10                Last change:                          4






Introduction to Library Functions                    PCREPOSIX(3)



     The regerror()  function  maps  a  non-zero  errorcode  from
     either  regcomp()  or  regexec()  to a printable message. If
     preg is not NUL, the error should have arisen from the  use
     of  that structure. A message terminated by a binary zero is
     placed in errbuf. The length of the message,  including  the
     zero,  is  limited to errbufsize. The yield of the function
     is the size of buffer needed to hold the whole message.

MEMORY USAGE

     Compiling a regular expression causes memory to be allocated
     and  associated  with  the preg structure. The function reg-
     free() frees all such memory, after which preg may no longer
     be used as a compiled expression.

AUTHOR

     Philip Hazel
     University Computing Service
     Cambridge CB2 3QH, England.

REVISION

     Last updated: 05 April 2008
     Copyright (c) 1997-2008 University of Cambridge.

ATRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:

     
       ATRIBUTE TYPE     ATRIBUTE VALUE
    
     Availability         SUNWpcre       
    
     Interface Stability  Uncommitted    
    

NOTES
     Source for PCRE is available on http:/opensolaris.org.















SunOS 5.10                Last change:                          5



OpenSolaris man pages main menu

Contact us      |       About us      |       Term of use      |       Copyright © 2000-2010 MyWebUniversity.com ™