Matching E-Mail Addresses

A much more complicated example comes when we consider matching e-mail addresses. These come in a number of formats, some of which are extremely complicated. We will want to write a regular expression to verify at least the most common formats.

An e-mail address consists of three basic parts: the username, the @ symbol, and the domain name with which that username is associated:


The username, in its basic form, can consist of ASCII alphanumeric characters, underscores, periods, and dashes, describable by the regular expression: [[:alnum:]._-]+. In more complicated formats, it can be any sequence of characters enclosed in double quotes, and it can even include backslashes to escape seemingly invalid characters such as spaces and other backslashes.

We will limit ourselves to the most basic scenario for this sample and invite readers to look at the documentation in RFC 3696 (http:/// and RFC 2822 (http:/// for complete details of all possible e-mail address formats.

The domain name is a series of alphanumeric words, separated by periods. There cannot be a period before the first word or after the last word. In addition to alphanumeric characters, the words can contain the dash character. The last word in the domain name, such as com, edu, org, jp, or biz, will not contain a dash. Our regular expression for this might be as follows:


The optional block in the middle, along with the * (which is the same as the {0, } quantifier), lets us insert arbitrary numbers of subdomains and associated dot characters into our domain name. The preceding regular expression correctly matches domains such as these:

So, with all of these pieces, we now have a complete regular expression to look for a well-formed (syntactically, at least) e-mail address:


You are encouraged to try other regular expressions on your own to match things you see on a regular basis, such as URLs, credit card numbers, or license plate numbers in your home area. A key tip to help you with this is to break your regular expressions into subproblems, and solve all of those, before putting them together into one larger expression. If you try to solve the entire problem from the start, a small error is likely to sink your entire expression and be much more difficult to find.