Categories
PHP

Word Boundaries in Regular Expression

How to mark word boundaries to “match a specific word only” using a regular expression.

Matching Word Boundaries using \b escape sequence

Example: Matching a word without using the word boundaries:

<?php
 $words = 'The tutors, BrainBell.com';
 $pattern = '/tutor/';
 $found = preg_match($pattern, $words);
 # returns 1 (match found)

 if ($found === 1)
  echo 'Word tutor found';
 else
  echo 'Word tutor not found';
 #Prints: Word tutor found

Searching for the word “tutor” matched “tutor”, “tutors”, and “tutorial”. When we want to match only the word “tutor”, we need a way to mark word boundaries. This is done in regular expressions by using the word boundary escape sequence “\b“.

  • The pattern /\btutor/ matches a word beginning with the pattern tutor, and would match tutors, tutorial, or tutorials, but would not find mistutor.
  • The pattern /tutor\b/ matches a word ending with the pattern tutor, and would match mistutor or tutor, but not tutors.
  • The pattern /\btutor\b/ matches a word beginning and ending with the pattern tutor, and would match only the word tutor.
<?php
 $words = 'The tutors, BrainBell.com';
 $pattern = '/\btutor\b/';
 $found = preg_match($pattern, $words);
 # returns 0, not match found

 if ($found === 1)
  echo 'Word tutor found';
 else
  echo 'Word tutor not found';
 #Prints: Word tutor not found

Example 2:

<?php
 $words = 'The tutor, BrainBell.com';
 $pattern = '/\btutor\b/';
 $found = preg_match($pattern, $words);
 # returns 1, match found

 if ($found === 1)
  echo 'Word tutor found';
 else
  echo 'Word tutor not found';
 #Prints: Word tutor found

Matching Word Boundaries using Character Classes

You can also use “[:<:] and [:>:]” anchors, for a word’s left and right boundaries, respectively. These anchors must be used within character classes [ ].

  • The pattern /[[:<:]]tutor/ matches a word beginning with the pattern tutor, and would match tutors, tutorial, or tutorials, but would not find mistutor.
  • The pattern /tutor[[:>:]]/ matches a word ending with the pattern tutor, and would match mistutor or tutor, but not tutors.
  • The pattern /[[:<:]]tutor[[:>:]]/ matches a word beginning and ending with the pattern tutor, and would match only the word tutor.

These two anchors are used in regular character classes as follows:

<?php
 $words = 'The tutors, BrainBell.com';
 $pattern = '/[[:<:]]tutor[[:>:]]/';
 $found = preg_match($pattern, $words);
 # returns 0, not match found

 $words = 'The tutor, BrainBell.com';
 $found = preg_match($pattern, $words);
 # returns 1, match found

The above examples show that the word “tutors” did not match when you used the word boundaries in the regex pattern.


More Regular Expressions Tutorials: