Lesson 3 | Regular expression reference |
Objective | Write a regular expression that will catch most common misspellings of your name. |
Regular Expression Reference
Perl regular expressions are based on the standard egrep
-style (so-called version 8) regexps.
These regexes perform pattern matching based on a set of rules. The basic set of rules are explained in this lesson.
For the purpose of the examples in this discussion, we will use the simple form of Perl's pattern-matching
operator (m//
):
For a review on the matching operator, see "The match operator" lesson from Module 3.
Pattern - Matching Rules
There are a lot of details in this lesson that we will be using later on in the module. Be sure to read each of the paragraphs below as well as the linked pages from this lesson. In addition, we will apply the regular expressions discussed to the yes/no if
structure we examined in the previous lesson. Any single character matches itself, unless it is one of the recognized metacharacters.
1) Perl Metacharacters Example
Note: By now you have noticed that some characters in regexes have a special meaning.
These are called metacharacters. The following are the metacharacters that Perl regular expressions recognize:
{} [] () ^ $ . | * + ? \
If you want to match the literal version of any of those characters, you must precede them with a backslash, \. As you go through the chapter, the meaning of these metacharacters will become clear.
These are the recognized metacharacters:
+ ? . * % $ ( ) [ ] { } | \
For example,
/$15/
will not match this pattern:
Can I borrow $15?
If any of the metacharacters are present in your expression, and you are specifically looking for that character, you will need to escape it in order to have it included in your results. In the above search example, use:
/$15/
You can also use special metacharacters to match the beginning or end of a line or string .
2) Perl Special Metacharacters
if($input=~/^[Yy](es)?$/)
{ print "Let's play!\n" }
else
{ print "Okay. Thanks anyway.\n" }
Let's examine the regular expression:
~/^[Yy](es)?$/
Brackets are used to create your own class of characters.
Matching a Class of Characters
Here are some examples of matching a class of characters:
[A-Z]
will match any uppercase character
[0-9]
will match any digit
[Nn]o
will match No or no
A negative class (anything except the class) can be created by using the
^
character.
[^A-Z]
will match anything except an uppercase character
[^0-9]
will match any nondigit
Now let us apply this to our yes/no
if
structure:
if($input=~/^[Yy]es\b/
{ print "Let us play!\n" }
else
{ print "Okay. Thanks anyway.\n" }
[Yy]
matches either Y
or y
, so the whole search matches
Yes
or yes
.
You could also use /i
to ignore case, so
/^yes\b/i
would match Yes
, yes
, YEs
,
yES
, yeS
, and YES
.
- The backslash (
\
) character is used to create special escape characters for matching some nonalphanumerics and classes of characters.
- The period (
.
) matches any character (except \n
). To match a period itself, use \.
or [.]
.
- Alternate matches
can be specified using
|
to separate them.
- Within a pattern, you can specify subpatterns for later reference by enclosing them in parenthesis. You can refer to those subpatterns later by using
\
n where the n refers back to the nth subpattern.
These are called back-references.
- You can repeat a pattern several times by following a character, class, or parenthesized expression with one of
these quantifiers.
Perl Repeat Quantifiers
Here is a list of the repeat quantifiers:
Character | Value |
? | zero times or one time |
* | zero or more times |
+ | one or more times |
{x} | exactly x times |
{x,y} | x to y times |
{x,} | x or more times |
For example,
/(foo )?fi fum/
# matches foo fi fum or fi fum
/^\s+/
# matches a line with leading
# whitespace
/(\bif\b){2,}/
# matches a line with repeated ifs
All of these quantifiers are greedy by default. That is, they will match the maximum number of characters that will not break the expression. You can change any of them to become nongreedy by using a ?
immediately after the quantifier. For example,
/^(.*)\s.*/
This will put all of the characters up to the last whitespace in the subexpression.
/^(.*?)\s.*/
This will put all of the characters up to the first whitespace in the subexpression.
The concept of greediness will become more important when you learn about the substitution operator later in this module.
Using repeat quantifiers
Now that we have more tools, let us apply repeat quantifiers to our Yes/no if
structure:
if($input=~/^[Yy](es)?\b/
{ print "Let us play!\n" }
else
{ print "Okay. Thanks anyway.\n" }
In this example, (es)?
matches "es" in an expression 0 or 1 times.
So the whole expression would match Y
, y
, Yes
, or yes
.
Notice how much cleaner this is than the "or" structure you looked in the alternative matching example:
if($input=~/^[Yy](es)?\b/
versus
if($input=~/^([Yy])|([Yy])es\b/
Perl Spell Check Name - Exercise
Advanced Perl Programming