Regular Expressions

In this section we show how regular expressions can achieve more sophisticated pattern matching to find, extract, and even replace complex substrings within a string.

While regular expressions provide capabilities beyond those described in the last section, complex pattern matching isn't as efficient as simple string comparisons. The functions described in the last section are more efficient than those that use regular expressions and should be used if complex pattern searches aren't required.

This section starts with a brief description of the POSIX regular expression syntax. This isn't a complete description of all the capabilities, but we do provide enough details to create quite powerful regular expressions. The second half of the section describes the functions that use POSIX regular expressions. Examples of regular expressions can be found in this section and in Chapter 7.

Regular Expression Syntax

A regular expression follows a strict syntax to describe patterns of characters. PHP has two sets of functions that use regular expressions: one set supports the Perl Compatible Regular Expression (PCRE) syntax, while the other supports the POSIX extended regular expression syntax. In this tutorial we use the POSIX functions.

To demonstrate the syntax of regular expressions, we introduce the function ereg():

boolean ereg(string pattern, string subject [, array var])

ereg( ) returns true if the regular expression pattern is found in the subject string. We discuss how the ereg( ) function can extract values into the optional array variable var later in this section.

The following trivial example shows how ereg() is called to find the literal pattern "cat" in the subject string "raining cats and dogs":

// prints "Found a cat"
if (ereg("cat", "raining cats and dogs"))
  echo "Found 'cat'";

The regular expression "cat" matches the subject string, and the fragment prints "Found 'cat'".

Characters and wildcards

To represent any character in a pattern, a period is used as a wildcard. The pattern "c.." matches any three-letter string that begins with a lowercase "c"; for example, "cat", "cow", "cop", etc. To express a pattern that actually matches a period, use the backslash character \-for example, "\.com" matches ".com" but not "xcom".

The use of the backslash in a regular expression can cause confusion. To include a backslash in a double-quoted string, you need to escape the meaning of the backslash with a backslash. The following example shows how the regular expression pattern "\.com" is represented:

// Sets $found to true
$found = ereg("\\.com", "");

It's better to avoid the confusion and use single quotes when passing a string as a regular expression:

$found = ereg('\.com', "");

Rather than using a wildcard that matches any character, a list of characters enclosed in brackets can be specified within a pattern. For example, to match a three-character string that starts with a "p", ends with a "p", and contains a vowel as the middle letter, the expression:

ereg("p[aeiou]p", $var)