J2EEOnline J2EE

Regular Expressions   «Prev  Next»
Lesson 1

Introduction to Perl Regular Expressions

Perl regex
Regular expressions (sometimes called regexps) are expressions used for matching patterns of text. Regexes are used for everything from simple searches to complex search-and-replace procedures. If you have used the advanced mode on any of the major Internet search engines, then you have used regular expressions.
The purpose of this module is to familiarize you with the major features of Perl's regular expressions so that you can use this feature as effectively and seamlessly as possible. This is not intended to be an exhaustive regular expression tutorial; regular expressions can be a deep subject.
Many people, upon finding that regular expressions look cryptic, decide that they are too difficult and not worth the trouble.
Once you get past that first impression, you will begin to use them more effectively and find your programs becoming smaller, faster, and more efficient.

I recommend Jeffrey Friedl's book, Mastering Regular Expressions, for a detailed look into regular expressions.
Most software is written to work with and modify data in one format or another. Perl was originally designed as a system for processing logs and summarizing and reporting on the information. Because of this focus, a large proportion of the functions built into Perl are dedicated to the extraction and recombination of information. For example, Perl includes functions for splitting a line by a sequence of delimiters, and it can recombine the line later using a different set. If you cannot do what you want with the built-in functions, then Perl also provides a mechanism for regular expressions.
We can use a regular expression to extract information, or as an advanced search and replace tool, and as a transliteration tool for converting or stripping individual characters from a string.
In this module, we are going to concentrate on the data-manipulation features built into Perl, from the basics of numerical calculations through to basic string handling. We will also look at the regular expression mechanism and how it works and integrates into the Perl language. Furthermore, we will also take the opportunity to look at the Unicode character system. Unicode is a standard for displaying strings that supports not only the ASCII standard, which represents characters by a single byte, but also provides support for multibyte characters, including those with accents.

Regular expressions are a language used for parsing and manipulating text. They are often used to perform complex search-and-replace operations, and to validate that text data is well-formed. Today, regular expressions are included in most programming languages and scripting languages (like JavaScript). Furthermore regular expressions are incorporated into editors, applications, databases, and command-line tools. Pattern-matching operations of Perl enable the processing of log files and directories in any environment.