MyWebUniversity.com Home Page
 



OpenSolaris man pages main menu


Introduction to Library Functions                   PCRECOMPAT(3)



NAME
     PCRE - Perl-compatible regular expressions

DIFERENCES BETWEN PCRE AND PERL

     This document describes the differences  in  the  ways  that
     PCRE  and  Perl  handle regular expressions. The differences
     described here are mainly with respect to Perl  5.8,  though
     PCRE  versions  7.0 and later contain some features that are
     expected to be in the forthcoming Perl 5.10.

     1. PCRE has only a subset of Perl's UTF-8 and  Unicode  sup-
     port.  Details of what it does have are given in the section
     on UTF-8 support in the main pcre page.

     2. PCRE does  not  allow  repeat  quantifiers  on  lookahead
     assertions. Perl permits them, but they do not mean what you
     might think. For example, (?!a){3} does not assert that  the
     next  three characters are not "a". It just asserts that the
     next character is not "a" three times.

     3. Capturing subpatterns that occur inside  negative  looka-
     head  assertions  are  counted,  but  their  entries  in the
     offsets vector are never set. Perl sets its numerical  vari-
     ables  from  any  such  patterns that are matched before the
     assertion fails to match something (thereby succeeding), but
     only  if  the negative lookahead assertion contains just one
     branch.

     4. Though binary zero characters are supported in  the  sub-
     ject  string,  they  are  not  allowed  in  a pattern string
     because it is passed as a normal  C  string,  terminated  by
     zero.  The  escape sequence \0 can be used in the pattern to
     represent a binary zero.

     5. The following Perl escape sequences  are  not  supported:
     \l,  \u,  \L,  \U,  and \N. In fact these are implemented by
     Perl's general string-handling and are not part of its  pat-
     tern  matching  engine.  If  any of these are encountered by
     PCRE, an error is generated.

     6. The Perl escape sequences \p, \P, and  \X  are  supported
     only  if  PCRE is built with Unicode character property sup-
     port. The properties that can be tested with \p and  \P  are
     limited  to  the  general category properties such as Lu and
     Nd, script names such as Greek or Han, and the derived  pro-
     perties Any and L&.

     7. PCRE does support the \Q...\E  escape  for  quoting  sub-
     strings. Characters in between are treated as literals. This
     is slightly different from Perl in that $  and  @  are  also
     handled  as  literals inside the quotes. In Perl, they cause



SunOS 5.10                Last change:                          1






Introduction to Library Functions                   PCRECOMPAT(3)



     variable interpolation (but of course  PCRE  does  not  have
     variables). Note the following examples:

         Pattern            PCRE matches      Perl matches

         \Qabc$xyz\E        abc$xyz           abc followed by the
                                                contents of $xyz
         \Qabc\$xyz\E       abc\$xyz          abc\$xyz
         \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz

     The \Q...\E sequence is recognized both inside  and  outside
     character classes.

     8. Fairly obviously, PCRE does not support the (?{code}) and
     (??{code})  constructions.  However,  there  is  support for
     recursive patterns. This is not available in Perl  5.8,  but
     will  be  in  Perl  5.10.  Also,  the PCRE "callout" feature
     allows an external function  to  be  called  during  pattern
     matching. See the pcrecallout documentation for details.

     9. Subpatterns that are called recursively  or  as  "subrou-
     tines"  are always treated as atomic groups in PCRE. This is
     like Python, but unlike Perl.

     10. There are some differences that are concerned  with  the
     settings  of  captured  strings  when  part  of a pattern is
     repeated. For example, matching "aba"  against  the  pattern
     /^(a(b)?)]$/  in Perl leaves $2 unset, but in PCRE it is set
     to "b".

     11.  PCRE  does  support  Perl  5.10's  backtracking   verbs
     (*ACEPT),  (*FAIL), (*F), (*COMIT), (*PRUNE), (*SKIP), and
     (*THEN), but only in the forms  without  an  argument.  PCRE
     does  not  support (*MARK). If (*ACEPT) is within capturing
     parentheses, PCRE does not set that capture group;  this  is
     different to Perl.

     12. PCRE  provides  some  extensions  to  the  Perl  regular
     expression  facilities.  Perl 5.10 will include new features
     that are not in earlier versions, some  of  which  (such  as
     named  parentheses)  have  been  in PCRE for some time. This
     list is with respect to Perl 5.10:

     (a) Although lookbehind assertions must match  fixed  length
     strings,  each  alternative branch of a lookbehind assertion
     can match a different length of string. Perl  requires  them
     all to have the same length.

     (b) If PCREDOLARENDONLY is set and PCREMULTILINE is  not
     set,  the  $  meta-character matches only at the very end of
     the string.




SunOS 5.10                Last change:                          2






Introduction to Library Functions                   PCRECOMPAT(3)



     (c) If PCREXTRA is set, a backslash followed by  a  letter
     with  no  special  meaning is faulted. Otherwise, like Perl,
     the backslash is quietly ignored.   (Perl  can  be  made  to
     issue a warning.)

     (d) If PCREUNGREDY is set, the greediness of  the  repeti-
     tion  quantifiers  is inverted, that is, by default they are
     not greedy, but if followed by a question mark they are.

     (e) PCREANCHORED can be used at matching time  to  force  a
     pattern  to  be tried only at the first matching position in
     the subject string.

     (f)  The  PCRENOTBOL,   PCRENOTEOL,   PCRENOTEMPTY,   and
     PCRENOAUTOCAPTURE  options  for  pcreexec() have no Perl
     equivalents.

     (g) The \R escape sequence can be restricted to  match  only
     CR, LF, or CRLF by the PCREBSRANYCRLF option.

     (h) The callout facility is PCRE-specific.

     (i) The partial matching facility is PCRE-specific.

     (j) Patterns compiled by PCRE can be saved and re-used at  a
     later  time,  even  on  different  hosts that have the other
     endianness.

     (k)  The  alternative  matching  function  (pcredfaexec())
     matches in a different way and is not Perl-compatible.

     (l) PCRE recognizes some special sequences such as (*CR)  at
     the  start of a pattern that set overall options that cannot
     be changed within the pattern.

AUTHOR

     Philip Hazel
     University Computing Service
     Cambridge CB2 3QH, England.

REVISION

     Last updated: 11 September 2007
     Copyright (c) 1997-2007 University of Cambridge.

ATRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:






SunOS 5.10                Last change:                          3






Introduction to Library Functions                   PCRECOMPAT(3)



     
       ATRIBUTE TYPE     ATRIBUTE VALUE
    
     Availability         SUNWpcre       
    
     Interface Stability  Uncommitted    
    

NOTES
     Source for PCRE is available on http:/opensolaris.org.













































SunOS 5.10                Last change:                          4



OpenSolaris man pages main menu

Contact us      |       About us      |       Term of use      |       Copyright © 2000-2010 MyWebUniversity.com ™