Categories
PHP

Escaping special characters in regular expressions

The :.\+*?[^]$(){}=!<>|:-# letters are special characters that make regular expressions work. If you want to match one of these characters you need to write a backslash in front of that character in the pattern. For example, if you want to match a “+” character, you write “\+” in the pattern. You can escape special characters manually by placing a backslash in front of each character you want to match, or you can the use preg_quote() function to escape special characters automatically. In this tutorial, you’ll also learn a technique that helps you to avoid escaping special characters.

  1. How to avoid character escaping in a regular expression
  2. Escaping backslashes
  3. Escaping $ character
  4. Escaping with preg_quote() function

We’ve already discussed the need to escape the special meaning of characters used as operators in a regular expression. However, when to escape the meaning depends on how the character is used. Escaping the special meaning of a character is done with the backslash character as with the expression "2\+3", which matches the string "2+3". If the + isn’t escaped, the pattern matches one or many occurrences of the character 2 followed by the character 3.

Example: Find if 2+3 exists in the string:

Escape the + character in the pattern as it is the meta character:

<?php
 $string = 'Hi, 2+3 is equal to 5';

 # need to escape +
 $pattern = '/2\+3/';
 $found = preg_match($pattern, $string, $match);# true
 print_r($match); #Array ( [0] => 2+3 )

Read the preg_match() and preg_match_all() tutorial for more detail on pattern matching.

Here are the special regular expression characters that need to escape with a backslash, read the Regular Expressions tutorial for more detail on these characters:

: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : - #

Example: Find if the area code is enclosed in parentheses

The parentheses are special characters, escape them in the pattern to find the match:

<?php
 // need to escape (  )
 $phone = '(01) 2345 67890';
 $pattern = '/^\([0-9]{2,3}\)/';
 $found = preg_match($pattern, $phone, $match); # true
 print_r($match); # Array ( [0] => (01) )

How to avoid escaping

Another way to write this expression is to express the + in the list of characters as "2[+]3". Because + doesn’t have the same meaning in a list, it doesn’t need to be escaped in that context. Using character lists in this way can improve readability. The following examples show how escaping is used and avoided:

<?php
 #$pattern = '/2\+3/';
 $pattern = '/2[+]3/';
 $string = 'Hi, 2+3 is equal to 5';
 $found = preg_match($pattern, $string, $match);# true
 print_r($match); #Array ( [0] => 2+3 )

 // Don't need to escape the dot within [ ]
 $domain = 'BrainBell.com';
 $pattern = '/[.]com/';
 $found = preg_match($pattern, $domain, $match);# true
 print_r($match); # .com
 
 // No need to escape (*.+?)| within [ ]
 $special = 'List: .\+*?[^]$( ){ }=!< >|-#';
 $pattern = '/[$^(*.+?)|]/';
 $found = preg_match_all($pattern, $special, $match);# true
 print_r($match); // . + * ? ^ $ ( ) |

Escaping backslashes

Another complication arises due to the fact that a regular expression is passed as a string to the regular expression functions. Strings in PHP can also use the backslash character to escape quotes, encode tabs, newlines, etc. Consider the following example, which matches a backslash character:

<?php
 # The backslash always needs to be quoted to match
 $backSlash = 'The backsash \ character';
 $pattern = '/^[a-zA-Z \\\\]*$/';
 $found = preg_match($pattern, $backSlash, $match); //true
 print_r($match); # The backsash \ character

The regular expression looks quite odd to match a backslash, the regular expression function needs to escape the meaning of the backslash.

Note: Single and double-quoted PHP strings have a special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.

https://www.php.net/manual/en/regexp.reference.escape.php

Escaping $ character

The last complication is that PHP interprets the $ character as the beginning of a variable name, so we need to escape that.

<?php
 $string = '$100 + $150 = $250';

 # For double-quoted pattern you need to
 # escape $ character with double backslash
 $pattern = "/\\$[0-9]+/";
 $found = preg_match($pattern, $string, $match);
  print_r($match); # 100

Using a single-quoted string can help make regular expressions easier to read and write.

 <?php
 $string = '$100 + $150 = $250';
 # For single-quoted pattern you need to
 # escape $ character with a single backslash
 $pattern = '/\$[0-9]+/';
 $found = preg_match($pattern, $string, $match);
 print_r($match); # 100

preg_quote()

<?php
 //Syntax
 preg_quote(string $str, ?string $delimiter = null): string

The preg_quote() function takes two parameters:

  1. $str: The input string.
  2. $delimiter (optional): The specified delimiter will also be escaped.

Here are the special regular expression characters that preg_quote() escapes, it escapes these characters with a backslash:

<?php
 $meta = ': . \ + * ? [ ^ ] $ ( ) { } = ! < > | : - #';
 echo preg_quote($meta);
/*Prints:
 \: \. \\ \+ \* \? \[ \^ \] \$ \( \) \{ \} \= \! \< \> \| \: \- \#
*/

The preg_quote() function does not escape the / character which is the most commonly used delimiter. If you’re using the / character as your regular expression delimiter, pass preg_quote() an additional character to escape as a second argument. See the following example:

<?php
 $delimiter = '/';
 $special  = '/#-/';

 # Forward slashes not escaped
 echo preg_quote($special).'<br>';
 #Prints: /\#\-/

 # Forward slashes escaped when specified in 2nd argument
 echo preg_quote($special, $delimiter);
 #Prints: \/\#\-\/

Example: Using preg_quote

Escape the string with preg_quote() function in the pattern:

<?php
 $string = 'Hi, 2+3 is equal to 5';
 $find = preg_quote('2+3');
 $pattern = '/'.$find.'/';
 $found = preg_match($pattern, $string, $match);# true
 print_r($match); // 2+3

The preg_quote() function is particularly useful when you dynamically insert a string into your regex pattern which may contain special characters that need escaping.


More Regular Expressions Tutorials: