Other Regular Expression Functions

You have thus far seen how to use the ereg function to match strings against regular expressions and find out what those matches were. There are a few other functions that we would like to mention, however, because they provide some additional functionality you might find useful in your web applications.


A very powerful application of regular expressions is to use them to help us find and replace items within a string, via the ereg_replace function. It takes three parameters, which are, in order

  1. The regular expression to match

  2. The text to replace any matches with

  3. The string on which to apply the operation

The function returns the third parameter with any applicable replacements applied.

A very simple usage is just to specify which string or pattern to replace with another, such as the following:

// replace all instances of "shoe" with "cat"
echo ereg_replace('shoe', 'cat',
                  'I like shoes and shoes like me.');
echo "<br/>\n";
// replace any USD monetary value with "(lots of money)"
echo ereg_replace('\$[0-9]+(\.[0-9]{1,2})?', '(lots of money)',
                  'John is paid $150453.44 each year!');

The output of this is as follows:

I like cats and cats like me.
John is paid (lots of money) each year!

This function, however, enables us to perform much more powerful replacements. To do this, it requires one extra piece of knowledge about regular expressions so that we can tell it what to replace.

Regular expressions have a feature known as back references. These assign a name to any group (delimited by parentheses, ( and ) ) in a regular expression. This name can then be used to tell ereg_replace what to replace in matches. In the POSIX regular expressions in PHP5, the first group will be given the name \1, the second \2, and the nth \n. n is not permitted to exceed 9 in this implementation, and \0 refers to the entire string.

For example, we need a regular expression to match % and ; characters in an input string, so we could put a backslash in front of them. This expression could be as follows:


Unfortunately, this gives us no way to use a back reference with any matches against that expression. To solve that problem, we just wrap it in parentheses to create a group, as follows:


Now we can refer to any matches against this group as \1. We now use ereg_replace to replace any matches in the group ([%;]) with a backslash character followed by that match:

$replaced = ereg_replace('([%;])', '\\\1', $in_string);

The first parameter instructs ereg_replace to match (as a group) any % or ; character. The second parameter tells it to then replace any matches from that group (\1) with a backslash (\\we use two backslashes because it has to be escaped) and the contents of that match (\1). We would thus see the input

Horatio %; DELETE FROM Users;

replaced with the following:

Horatio \%\; DELETE FROM Users\;

As a second example, if we want to clean up a phone number for output, we can write an extremely tolerant pattern for phone numbers that wraps each of the three-digit sections with grouping parentheses, such as the following:


The three groups of digits then have the back references \1, \2, and \3, from left to right. So, to clean up our phone numbers, we could write the following code:

$pn = '       123-    456 -   - 7890';
$pn_regex = '.*([0-9]{3,3}).*([0-9]{3,3}).*([0-9]{4,4})';
$str = ereg_replace($pn_regex, '(\1)\2-\3', $pn);

The preceding code would output the following very lovely phone number:


Note again that we are always using single quotes when writing the regular expressions or the string to replace. If we use double quotes, we have to include an extra backslash in front of the back references so that PHP does not try to treat them as escape sequences, as follows:

$str = ereg_replace($pn_regex, "(\\1)\\2-\\3", $pn);