Matching Postal Codes

U.S. postal codes (Zip codes) are rather straightforward to validate with regular expressions. They are a sequence of five digits followed optionally by what is called the "plus 4," which is a dash character followed by four more digits. A regular expression for this is as follows:

[0-9]{5,5}([- ]?[0-9]{4,4})?

The first part of this regular expression, [0-9]{5,5}, is rather straightforward, but the second part, ([- ]?[0-9]{4,4})?, might seem a little less so. In effect, we have grouped the entire "plus 4" sequence with parentheses and qualified those with a ? character, saying they can optionally not exist, or exist once and only once. Inside that, we have said that this group optionally starts with either a dash or space (we are very forgiving) with [- ]?, and then we have said that there must be four more digits with [0-9]{4,4}.

Canadian postal codes, on the other hand, are quite straightforward to determine. They are always of the format X#X #X#, where # represents a digit and X a letter from the English alphabet. A regular expression for this would be as follows:


We have been a little forgiving and let the user put any number of whitespace characters (including none) between the two blocks of three.

If we wanted to do a bit more research, however, we would realize that not all letters are valid in Canadian postal codes. For the first letter, in fact, only the letters in [ABCEGHJKLMNPRSTVXY] are valid. We could rewrite our regular expression as follows:


(We have split the above regular expression onto two lines for formatting purposes only.)