Introduction to Library Functions PCREPOSIX(3)
NAME
PCRE - Perl-compatible regular expressions.
SYNOPSIS OF POSIX API
#include
int regcomp(regext *preg, const char *pattern,
int cflags);
int regexec(regext *preg, const char *string,
sizet nmatch, regmatcht pmatch[], int eflags);
sizet regerror(int errcode, const regext *preg,
char *errbuf, sizet errbufsize);
void regfree(regext *preg);
DESCRIPTION
This set of functions provides a POSIX-style API to the PCRE
regular expression package. See the pcreapi documentation
for a description of PCRE's native API, which contains much
additional functionality.
The functions described here are just wrapper functions that
ultimately call the PCRE native API. Their prototypes are
defined in the pcreposix.h header file, and on Unix systems
the library itself is called pcreposix.a, so can be accessed
by adding -lpcreposix to the command for linking an applica-
tion that uses them. Because the POSIX functions call the
native ones, it is also necessary to add -lpcre.
I have implemented only those option bits that can be rea-
sonably mapped to PCRE native options. In addition, the
option REGEXTENDED is defined with the value zero. This has
no effect, but since programs that are written to the POSIX
interface often use it, this makes it easier to slot in PCRE
as a replacement library. Other POSIX options are not even
defined.
When PCRE is called via these functions, it is only the API
that is POSIX-like in style. The syntax and semantics of the
regular expressions themselves are still those of Perl, sub-
ject to the setting of various PCRE options, as described
below. "POSIX-like in style" means that the API approximates
to the POSIX definition; it is not fully POSIX-compatible,
and in multi-byte encoding domains it is probably even less
compatible.
The header for these functions is supplied as pcreposix.h to
avoid any potential clash with other POSIX libraries. It
SunOS 5.10 Last change: 1
Introduction to Library Functions PCREPOSIX(3)
can, of course, be renamed or aliased as regex.h, which is
the "correct" name. It provides two structure types, regext
for compiled internal forms, and regmatcht for returning
captured substrings. It also defines some constants whose
names start with "REG"; these are used for setting options
and identifying error codes.
COMPILING A PATERN
The function regcomp() is called to compile a pattern into
an internal form. The pattern is a C string terminated by a
binary zero, and is passed in the argument pattern. The preg
argument is a pointer to a regext structure that is used as
a base for storing information about the compiled regular
expression.
The argument cflags is either zero, or contains one or more
of the bits defined by the following macros:
REGDOTAL
The PCREDOTAL option is set when the regular expression is
passed for compilation to the native function. Note that
REGDOTAL is not part of the POSIX standard.
REGICASE
The PCRECASELES option is set when the regular expression
is passed for compilation to the native function.
REGNEWLINE
The PCREMULTILINE option is set when the regular expression
is passed for compilation to the native function. Note that
this does not mimic the defined POSIX behaviour for
REGNEWLINE (see the following section).
REGNOSUB
The PCRENOAUTOCAPTURE option is set when the regular
expression is passed for compilation to the native function.
In addition, when a pattern that is compiled with this flag
is passed to regexec() for matching, the nmatch and pmatch
arguments are ignored, and no captured strings are returned.
REGUTF8
The PCREUTF8 option is set when the regular expression is
passed for compilation to the native function. This causes
the pattern itself and all data strings used for matching it
to be treated as UTF-8 strings. Note that REGUTF8 is not
part of the POSIX standard.
SunOS 5.10 Last change: 2
Introduction to Library Functions PCREPOSIX(3)
In the absence of these flags, no options are passed to the
native function. This means the the regex is compiled with
PCRE default semantics. In particular, the way it handles
newline characters in the subject string is the Perl way,
not the POSIX way. Note that setting PCREMULTILINE has only
some of the effects specified for REGNEWLINE. It does not
affect the way newlines are matched by . (they aren't) or by
a negative class such as [^a] (they are).
The yield of regcomp() is zero on success, and non-zero oth-
erwise. The preg structure is filled in on success, and one
member of the structure is public: rensub contains the
number of capturing subpatterns in the regular expression.
Various error codes are defined in the header file.
MATCHING NEWLINE CHARACTERS
This area is not simple, because POSIX and Perl take dif-
ferent views of things. It is not possible to get PCRE to
obey POSIX semantics, but then PCRE was never intended to be
a POSIX engine. The following table lists the different pos-
sibilities for matching newline characters in PCRE:
Default Change with
. matches newline no PCREDOTAL
newline matches [^a] yes not changeable
$ matches \n at end yes PCREDOLARENDONLY
$ matches \n in middle no PCREMULTILINE
^ matches \n in middle no PCREMULTILINE
This is the equivalent table for POSIX:
Default Change with
. matches newline yes REGNEWLINE
newline matches [^a] yes REGNEWLINE
$ matches \n at end no REGNEWLINE
$ matches \n in middle no REGNEWLINE
^ matches \n in middle no REGNEWLINE
PCRE's behaviour is the same as Perl's, except that there is
no equivalent for PCREDOLARENDONLY in Perl. In both PCRE
and Perl, there is no way to stop newline from matching
[^a].
The default POSIX newline handling can be obtained by set-
ting PCREDOTAL and PCREDOLARENDONLY, but there is no
way to make PCRE behave exactly as for the REGNEWLINE
action.
SunOS 5.10 Last change: 3
Introduction to Library Functions PCREPOSIX(3)
MATCHING A PATERN
The function regexec() is called to match a compiled pattern
preg against a given string, which is by default terminated
by a zero byte (but see REGSTARTEND below), subject to the
options in eflags. These can be:
REGNOTBOL
The PCRENOTBOL option is set when calling the underlying
PCRE matching function.
REGNOTEOL
The PCRENOTEOL option is set when calling the underlying
PCRE matching function.
REGSTARTEND
The string is considered to start at string ]
pmatch[0].rmso and to have a terminating NUL located at
string ] pmatch[0].rmeo (there need not actually be a NUL
at that location), regardless of the value of nmatch. This
is a BSD extension, compatible with but not specified by
IE Standard 1003.2 (POSIX.2), and should be used with cau-
tion in software intended to be portable to other systems.
Note that a non-zero rmso does not imply REGNOTBOL;
REGSTARTEND affects only the location of the string, not
how it is matched.
If the pattern was compiled with the REGNOSUB flag, no data
about any matched strings is returned. The nmatch and pmatch
arguments of regexec() are ignored.
Otherwise,the portion of the string that was matched, and
also any captured substrings, are returned via the pmatch
argument, which points to an array of nmatch structures of
type regmatcht, containing the members rmso and rmeo.
These contain the offset to the first character of each sub-
string and the offset to the first character after the end
of each substring, respectively. The 0th element of the vec-
tor relates to the entire portion of string that was
matched; subsequent elements relate to the capturing subpat-
terns of the regular expression. Unused entries in the array
have both structure members set to -1.
A successful match yields a zero return; various error codes
are defined in the header file, of which REGNOMATCH is the
"expected" failure code.
EROR MESAGES
SunOS 5.10 Last change: 4
Introduction to Library Functions PCREPOSIX(3)
The regerror() function maps a non-zero errorcode from
either regcomp() or regexec() to a printable message. If
preg is not NUL, the error should have arisen from the use
of that structure. A message terminated by a binary zero is
placed in errbuf. The length of the message, including the
zero, is limited to errbufsize. The yield of the function
is the size of buffer needed to hold the whole message.
MEMORY USAGE
Compiling a regular expression causes memory to be allocated
and associated with the preg structure. The function reg-
free() frees all such memory, after which preg may no longer
be used as a compiled expression.
AUTHOR
Philip Hazel
University Computing Service
Cambridge CB2 3QH, England.
REVISION
Last updated: 05 April 2008
Copyright (c) 1997-2008 University of Cambridge.
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
ATRIBUTE TYPE ATRIBUTE VALUE
Availability SUNWpcre
Interface Stability Uncommitted
NOTES
Source for PCRE is available on http:/opensolaris.org.
SunOS 5.10 Last change: 5
|