Regular Expressions   «Prev  Next»
Lesson 4Pattern-matching operator
ObjectiveExplore the use of pattern matching in scalar and list contexts.

Pattern-matching Operator

In the next several lessons, we will cover the rest of the details about the pattern-matching m// operator that we introduced in the previous module.

Scalar Context

In a scalar context, the m// operator returns a logical value. That is, if the match is successful, then the return value will be 1 (true); if it is unsuccessful, then it will return the null string ("", false).

List context

In a list (or array) context, the m// operator returns a list of subexpressions matched by the parenthesis in the pattern. Let us look at a couple of examples.

Pattern-matching Examples

Let us look at a pattern-matching example:
Pattern-matching example
Pattern-matching example

This will put the first three words of the string in $a, $b, and $c, respectively. (It will also set the special variables, $1, $2, and $3.)
The m// operator naturally binds to the special $_ variable (which is used by default for many purposes), unless another variable is provided with one of the binding operators, =~ or !~.
So, if your intention is to search a particular variable, you must provide it with =~ or !~; for instance:

if($string =~ /^y(es)?$/i)
 { print "Yes!\n"} 
else
 { print "Nope.\n" }

In fact, it need not be a variable at all. The left-hand side of =~ can even be a function or subroutine, as in this example.
sub yesno{ 
 return shift =~ /^y(es)?$/i 
}

The example takes the value from the shift function, and binds it to the m// operator. The result (scalar/logical) of the match is returned via the return operator. The shift function is often used to pass values to a function. You will learn more about this in the next module.


Which of the following patterns will correctly capture all Hex numbers delimited by at least one whitespace in an input text?
  1. \s*0[xX][0-9a-fA-F]+\s*
  2. (\s|\b)0[xX][0-9a-fA-F]+(\s|\b)
  3. \s*0[xX][0-9a-fA-F]*\s*
  4. 0[xX][0-9a-fA-F]+\s
  5. \s0[xX][0-9a-fA-F]?\s
  6. \s0[xX][0-9a-fA-F]+\s
  7. \s?0[xX][0-9a-fA-F]+\s
  8. (\s|^)0[xX][0-9a-fA-F]+(\s|$)


Answer: h
Explanation:
  1. The * in \s* means zero or more white spaces, which allows it to capture 0x1 in asdf0x1asdf, which does not satisfy the delimiting condition.
  2. \b means word boundary. By using (\s|\b), we are saying that the number can either start with a space or the number could be at a word boundary. A word boundary does not actually match (i.e. eat up) any character. Therefore, it ensures that in text "aaa 0x22 0x33 bbb 0x44", 0x33 and 0x44 are also matched. But this will also match 0x1+0x2 or 0x1@0x2 because of |b. A word boundary occurs whenever there is a non-word character i.e. a character other than [a-zA-Z_0-9].
  3. * means 0 or more. Thus, \s* will not satisfy the delimiting condition.
  4. The requirement is that it should be delimited by a space, but this will capture something that does not start with a space. For example, it will capture 0x1 in "asdf0x1 madf".
  5. ? means 0 or 1 and therefore will only capture numbers with one digit.
  6. This will fail to match the numbers which have only one space between them. For example, aaa 0x11 0x22 nnn. Here, 0x22 will not be found because the expression will "eat up" the white space at the beginning and at the end of 0x11 and so there will be no white space left at the beginning of 0x22.
  7. This will also match a number that does not start with a white space. For example, in text "aaa0x14 bbb", 0x14 will be matched because of \s? at the beginning. \s? at the beginning implies that the number may or may not start with a blank space.
  8. ^ means beginning of the input and $ mean end of the input.
    By using ^ and $, we ensure that the hex number at the beginning and end are also matched because there is no whitespace at the beginning or end. For example, if your input string is: 0x22 0x33 0x44, without ^, 0x22 will not be captured and without $, 0x44 will not be captured.
The following program in Java contains various combinations of regular expressions:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class TestClass {
 public static void main(String[] args){
   Pattern pattern = Pattern.compile("(\\s|^)0[xX][0-9a-fA-F]+(\\s|$)");
   Matcher matcher =  pattern.matcher("asdf      0x14  jjhgjhg  0x22 0x22");
   boolean found = false;
   while (matcher.find()) {
     System.out.println("Found the text "+matcher.group()+ " starting at " 
     +matcher.start()+" and ending at index "+ matcher.end());
     found = true;
   }        
 }
} 
In the next lesson, we will look at the modifiers you can use with the m// operator.