21
Pattern Matching: Simple Patterns

Pattern Matching: Simple Patterns

Embed Size (px)

DESCRIPTION

Pattern Matching: Simple Patterns. Introduction. Programmers often need to scan a file, directory, etc. for a specific substring. Find all files that begin with “ A ”. Find all files that end in “ txt ” This capability is provided by a variety of tools. e.g. egrep, grep, awk, - PowerPoint PPT Presentation

Citation preview

Pattern Matching: Simple Patterns

Introduction

• Programmers often need to scan a file, directory, etc. for a specific substring.– Find all files that begin with “A”.– Find all files that end in “txt”

• This capability is provided by a variety of tools.– e.g. egrep, grep, awk,

• Useful to include this functionality in a programming language.

Perl’s Pattern Matcher

• Perl has a built in pattern matcher.– Motivation: system administrators frequently

use regular expressions. They also use Perl.

• Syntax is borrowed from the grep utility in Unix.

• Based on regular expressions from computer science.

Perl’s Pattern Matcher (cont.)

• Operates over a single string.• Contexts:

– Scalar: Returns true or false.

– List: Matching substrings returned in a list.

• The syntax is:m dl pattern dl [modifiers]

• (/) is the most common delimiter.– m operator is unnecessary.

• Other delimiters can be used:m~pattern~

Simple Patterns

• Simple patterns – match individual characters or character classes.

• An abstract representation of a set of strings.

• A pattern “matches” when the string it’s compared with is in the set.

• Matching is done from left to right.

Three Categories of Characters

• Normal characters:– Match themselves.– Includes escape characters – e.g. \t, \cC

• Metacharacters:– Have special meanings in patterns– \ | ( ) [ ] { } ^ $ * +

• Period:– Matches any character except newline.

An Example

$_ = “It’s snowing today.”;

if (/snow/) {print “There was snow somewhere in $_”;

}else {

print “$_ was snowless \n”;}

Character Classes

• Character classes specify collections of characters in patterns.

• Defined by placing the set in [ ]– e.g. /[<>=]

• Dashes are used specify ranges of characters:– /[A-Za-z]/– /[0-7]/– /[0-3-]/

Exclusion From a Class

• Characters can be excluded from a class with (^)

• Matches anything except the specified characters.

• For example:– /[^A-Za-z]/– /[^01]/

Useful Abbreviations

Abbreviation Pattern Matches

\d [0-9] A digit

\D [^0-9] A nondigit

\w [A-Za-z_] A word char

\W [^A-Za-z_] A nonword char

\s [ \r\t\n\f] A white-space char

\S [^ \r\t\n\f] A non-white-space char

Some Examples

• /[A-Z]”\s/

• /[\dA-Fa-f]/

• /\w\w:\d\d/

• /0x\d/

Variables in Patterns

• A variable in a pattern is interpolated.

• For example,$hexpat = “\\s[\dA-Fa-f]\\s”;

if (/$hexpat/) {

print “$_ has a hex digit.”

}

Quantifiers

• Quantifiers can make a pattern more powerful.

• Allows a pattern to be repeated a specified number of times.

• Perl has four kinds of quantifier:– *, +, ?, {m, n}

• Quantifier immediately follows the pattern it quantifies.

{m, n}

• {n} – exactly n repetitions.

• {m,} – at least m repetitions.

• {m,n} – at least m, but not more than n repetitions.

{m,n} Examples

• /a{1,3}b/ - ab, aab, aaab

• /ab{3}c/ - abbbc

• /ab{2,}c/ - abbc, abbbc, abbbbc, …

• /c{3} z{5}/ - ccc zzzzz

• /[abc] {1, 2}/ - a,b,c,ab,ac,ba,bc,ca,cb

Asterisk (*)

• (*) means zero or more repetitions.

• Equivalent to {0,}

• For example,– /0\d\d*/– /\w\w*/– /bob.*cat/

Plus (+)

• (+) means one or more repetitions.

• Equivalent to {1,}

• For example,– /\w+/– /[A-Za-z][A-Za-z\d_]+/– /\d+\.\d+/

Question Mark (?)

• (?) means either zero or one.

• Equivalent to {0,1}.

• For example,– /\d+\.?/– /\$?\d+\.\d\d/– /”?\w+”?/

Subpatterns

• Quantifiers modify only the last character.– e.g. /ball*/

• () can be used to group parts of patterns.

• The quantifier modifies the group.

• For example,– /(ball)*/– /(boo! ){3}/

Alternation

• (|) is the logical OR operator in a pattern.

• /a|e|i|o|u/ is equivalent to /[aeiou]/

• For example,– /(Bob|Tom|Pussy|Scaredy)cat/– /t(oo?|wo)/

• Be careful!– /Tom|Tommie/

Precedence

• The precedence of the operators are:– Parenthesis– Quantifiers– Character Sequence– Alternation

• For example,– /#|-+/– /(#|-)+/