What Are Regular Expressions?

A regular expression is just a description of a pattern typically specified in a string. When you compare a string against the regular expression, the processing engine determines whether the string matches the expression (and if so, in what way). Therefore, instead of having to look through a phone number searching for individual characters, we can create a regular expression that says something more like "look for a series of 10 digits, possibly with some parentheses around the first 3 characters and a dash between the sixth and seventh characters."

The syntax used to describe these regular expressions is powerful, flexible, and unfortunately somewhat dialectal. A few major implementations of regular expressions available differ slightly in their details. Fortunately, these differences are not major, and we can typically move from one system to another without too much trouble.

PHP provides programmers with two regular expression processing engines. The first is called the Perl Compatible Regular Expressions (PCRE) extension and is modeled on the processor used in Perl, an extremely powerful language that has regular expressions tightly integrated into its programming model. The second flavor is called POSIX Extended Regular Expressions and is based on the standard for regular expressions defined by the POSIX 1003.2 standard.

Both extensions are enabled by default in PHP, and you can use them with a number of functions. However, this tutorial focuses entirely on the POSIX regular expressions for the following reasons:

  • PCRE is already extremely well documented in numerous places and has a remarkable amount of user support through the Perl community.

  • The POSIX regular expressions are multi-byte character set enabled in PHP, whereas the PCRE extension is not. Given that we are focusing our efforts largely on writing globalizable applications, we want to be sure foreign language characters can be properly processed.

This is not meant to be a judgment in favor of one regular expression engine over the other. There are a number of features in the PCRE engine that are not available in the POSIX one that many programmers find invaluable, and it can be faster in a number of situations. If your application does not require multi-byte character set support and you can be sure that you are dealing with input data that is in a certain code page, the Perl regular expressions might be appropriate for you.