Introduction to Library Functions PCRECALOUT(3)
NAME
PCRE - Perl-compatible regular expressions
PCRE CALOUTS
int (*pcrecallout)(pcrecalloutblock *);
PCRE provides a feature called "callout", which is a means
of temporarily passing control to the caller of PCRE in the
middle of pattern matching. The caller of PCRE provides an
external function by putting its entry point in the global
variable pcrecallout. By default, this variable contains
NUL, which disables all calling out.
Within a regular expression, (?C) indicates the points at
which the external function is to be called. Different cal-
lout points can be identified by putting a number less than
256 after the letter C. The default value is zero. For
example, this pattern has two callout points:
(?C1)abc(?C2)def
If the PCREAUTOCALOUT option bit is set when
pcrecompile() is called, PCRE automatically inserts cal-
louts, all with number 255, before each item in the pattern.
For example, if PCREAUTOCALOUT is used with the pattern
A(\d{2}--)
it is processed as if it were
(?C255)A(?C255)((?C255)\d{2}(?C255)(?C255)-(?C255)-
(?C255))(?C255)
Notice that there is a callout before and after each
parenthesis and alternation bar. Automatic callouts can be
used for tracking the progress of pattern matching. The
pcretest command has an option that sets automatic callouts;
when it is used, the output indicates how the pattern is
matched. This is useful information when you are trying to
optimize the performance of a particular pattern.
MISING CALOUTS
You should be aware that, because of optimizations in the
way PCRE matches patterns, callouts sometimes do not happen.
For example, if the pattern is
ab(?C4)cd
PCRE knows that any matching string must contain the letter
"d". If the subject string is "abyz", the lack of "d" means
SunOS 5.10 Last change: 1
Introduction to Library Functions PCRECALOUT(3)
that matching doesn't ever start, and the callout is never
reached. However, with "abyd", though the result is still no
match, the callout is obeyed.
THE CALOUT INTERFACE
During matching, when PCRE reaches a callout point, the
external function defined by pcrecallout is called (if it
is set). This applies to both the pcreexec() and the
pcredfaexec() matching functions. The only argument to the
callout function is a pointer to a pcrecallout block. This
structure contains the following fields:
int version;
int calloutnumber;
int *offsetvector;
const char *subject;
int subjectlength;
int startmatch;
int currentposition;
int capturetop;
int capturelast;
void *calloutdata;
int patternposition;
int nextitemlength;
The version field is an integer containing the version
number of the block format. The initial version was 0; the
current version is 1. The version number will change again
in future if additional fields are added, but the intention
is never to remove any of the existing fields.
The calloutnumber field contains the number of the callout,
as compiled into the pattern (that is, the number after ?C
for manual callouts, and 255 for automatically generated
callouts).
The offsetvector field is a pointer to the vector of
offsets that was passed by the caller to pcreexec() or
pcredfaexec(). When pcreexec() is used, the contents can
be inspected in order to extract substrings that have been
matched so far, in the same way as for extracting substrings
after a match has completed. For pcredfaexec() this field
is not useful.
The subject and subjectlength fields contain copies of the
values that were passed to pcreexec().
The startmatch field normally contains the offset within
the subject at which the current match attempt started. How-
ever, if the escape sequence \K has been encountered, this
value is changed to reflect the modified starting point. If
SunOS 5.10 Last change: 2
Introduction to Library Functions PCRECALOUT(3)
the pattern is not anchored, the callout function may be
called several times from the same point in the pattern for
different starting points in the subject.
The currentposition field contains the offset within the
subject of the current match pointer.
When the pcreexec() function is used, the capturetop field
contains one more than the number of the highest numbered
captured substring so far. If no substrings have been cap-
tured, the value of capturetop is one. This is always the
case when pcredfaexec() is used, because it does not sup-
port captured substrings.
The capturelast field contains the number of the most
recently captured substring. If no substrings have been cap-
tured, its value is -1. This is always the case when
pcredfaexec() is used.
The calloutdata field contains a value that is passed to
pcreexec() or pcredfaexec() specifically so that it can
be passed back in callouts. It is passed in the pcrecallout
field of the pcreextra data structure. If no such data was
passed, the value of calloutdata in a pcrecallout block is
NUL. There is a description of the pcreextra structure in
the pcreapi documentation.
The patternposition field is present from version 1 of the
pcrecallout structure. It contains the offset to the next
item to be matched in the pattern string.
The nextitemlength field is present from version 1 of the
pcrecallout structure. It contains the length of the next
item to be matched in the pattern string. When the callout
immediately precedes an alternation bar, a closing
parenthesis, or the end of the pattern, the length is zero.
When the callout precedes an opening parenthesis, the length
is that of the entire subpattern.
The patternposition and nextitemlength fields are
intended to help in distinguishing between different
automatic callouts, which all have the same callout number.
However, they are set for all callouts.
RETURN VALUES
The external callout function returns an integer to PCRE. If
the value is zero, matching proceeds as normal. If the value
is greater than zero, matching fails at the current point,
but the testing of other matching possibilities goes ahead,
just as if a lookahead assertion had failed. If the value is
less than zero, the match is abandoned, and pcreexec() (or
SunOS 5.10 Last change: 3
Introduction to Library Functions PCRECALOUT(3)
pcredfaexec()) returns the negative value.
Negative values should normally be chosen from the set of
PCRERORxxx values. In particular, PCRERORNOMATCH
forces a standard "no match" failure. The error number
PCRERORCALOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
AUTHOR
Philip Hazel
University Computing Service
Cambridge CB2 3QH, England.
REVISION
Last updated: 29 May 2007
Copyright (c) 1997-2007 University of Cambridge.
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
ATRIBUTE TYPE ATRIBUTE VALUE
Availability SUNWpcre
Interface Stability Uncommitted
NOTES
Source for PCRE is available on http:/opensolaris.org.
SunOS 5.10 Last change: 4
|