Regular Expressions   «Prev  Next»
Lesson 3Regular expression reference
ObjectiveWrite a regular expression that will catch most common misspellings of your name.

Regular Expression Reference

Perl regular expressions are based on the standard egrep-style (so-called version 8) regexps. These regexes perform pattern matching based on a set of rules. The basic set of rules are explained in this lesson.
For the purpose of the examples in this discussion, we will use the simple form of Perl's pattern-matching

operator (m//): 

For a review on the matching operator, see "The match operator" lesson from Module 3.

Pattern - Matching Rules

There are a lot of details in this lesson that we will be using later on in the module. Be sure to read each of the paragraphs below as well as the linked pages from this lesson. In addition, we will apply the regular expressions discussed to the yes/no if structure we examined in the previous lesson. Any single character matches itself, unless it is one of the recognized metacharacters.

1) Perl Metacharacters Example

Note: By now you have noticed that some characters in regexes have a special meaning. These are called metacharacters. The following are the metacharacters that Perl regular expressions recognize:
{} [] () ^ $ . | * + ? \

If you want to match the literal version of any of those characters, you must precede them with a backslash, \. As you go through the chapter, the meaning of these metacharacters will become clear.

These are the recognized metacharacters:
+ ? . * % $ ( ) [ ] { } | \

For example,
/$15/
will not match this pattern:
Can I borrow $15?

If any of the metacharacters are present in your expression, and you are specifically looking for that character, you will need to escape it in order to have it included in your results. In the above search example, use:
/$15/
You can also use special metacharacters to match the beginning or end of a line or string .

2) Perl Special Metacharacters

The following special metacharacters have these special meanings:
  1. ^ matches the beginning of the line or string.
  2. $ matches the end of the line or string.

When the special metacharacter ^ is used outside of a bracketed character class, it means "the beginning of a line or string." However, when ^ is used inside a bracketed character class, it negates the immediately following character or group of characters. Here is an example of how you would apply the special metacharacters to our yes/no if structure:

if($input=~/^[Yy](es)?$/)
   { print "Let's play!\n" }
else
   { print "Okay. Thanks anyway.\n" }
Let's examine the regular expression:
~/^[Yy](es)?$/


The special metacharacter ^ matches the beginning of the string.
  1. [Yy] matches 1 character from a set of either Y or y.
  2. (es)? matches a pattern of es either 0 or 1 times.
  3. $ matches the end of a string.
Brackets are used to create your own class of characters.

Matching a Class of Characters

Here are some examples of matching a class of characters:
  1. [A-Z] will match any uppercase character
  2. [0-9] will match any digit
  3. [Nn]o will match No or no
A negative class (anything except the class) can be created by using the ^ character.
  1. [^A-Z] will match anything except an uppercase character
  2. [^0-9] will match any nondigit

Now let us apply this to our yes/no if structure:

if($input=~/^[Yy]es\b/
   { print "Let us play!\n" }
else
   { print "Okay. Thanks anyway.\n" }

[Yy] matches either Y or y, so the whole search matches Yes or yes.
You could also use /i to ignore case, so
/^yes\b/i

would match Yes, yes, YEs, yES, yeS, and YES.
  1. The backslash (\) character is used to create special escape characters for matching some nonalphanumerics and classes of characters.
  2. The period (.) matches any character (except \n). To match a period itself, use \. or [.].
  3. Alternate matches can be specified using | to separate them.
  4. Within a pattern, you can specify subpatterns for later reference by enclosing them in parenthesis. You can refer to those subpatterns later by using \n where the n refers back to the nth subpattern. These are called back-references.
  5. You can repeat a pattern several times by following a character, class, or parenthesized expression with one of these quantifiers.

Perl Repeat Quantifiers

Here is a list of the repeat quantifiers:
CharacterValue
?zero times or one time
*zero or more times
+one or more times
{x}exactly x times
{x,y}x to y times
{x,}x or more times

For example,
/(foo )?fi fum/
 # matches foo fi fum or fi fum
 
/^\s+/
 # matches a line with leading
 # whitespace
 
/(\bif\b){2,}/
 # matches a line with repeated ifs

All of these quantifiers are greedy by default. That is, they will match the maximum number of characters that will not break the expression. You can change any of them to become nongreedy by using a ? immediately after the quantifier. For example,

/^(.*)\s.*/

This will put all of the characters up to the last whitespace in the subexpression.
/^(.*?)\s.*/

This will put all of the characters up to the first whitespace in the subexpression.
The concept of greediness will become more important when you learn about the substitution operator later in this module.

Using repeat quantifiers

Now that we have more tools, let us apply repeat quantifiers to our Yes/no if structure:

if($input=~/^[Yy](es)?\b/
   { print "Let us play!\n" }
else
   { print "Okay. Thanks anyway.\n" }

In this example, (es)? matches "es" in an expression 0 or 1 times. So the whole expression would match Y, y, Yes, or yes. Notice how much cleaner this is than the "or" structure you looked in the alternative matching example:

if($input=~/^[Yy](es)?\b/

versus
if($input=~/^([Yy])|([Yy])es\b/


Perl Spell Check Name - Exercise

Click the exercise link below to write a regular expression that will catch misspellings of your name.
Perl Spell Check Name - Exercise

Advanced Perl Programming