Upload
victor-neal
View
224
Download
0
Embed Size (px)
Citation preview
Pattern Matching II
Greedy Matching• When dealing with quantifiers, Perl’s
pattern matcher is by default greedy.• For example,
– $_ = “Bob sat next to the Bobcat and listened to the Bobolink”;
/.*Bob/– $_ = “Freddie’s hot dogs”;
/Fred+/– $_ = “Freddie’s hot dogs are really hot!”;
/.*hot/
Minimal Matching
• The minimal mode is specified by (?) after the quantifier.
• For example, – $_ = “Freddie’s hot dogs”;
/Fred+?/– $_ = “Freddie’s hot dogs are really hot!”;
/.*?hot/
Multiple Quantifiers
• Leftmost quantifier is greediest.
• For example,– $_ = “Bob sat next to the Bobcat and listened to
the Bobolink”;
/Bob.*Bob.*link/
• The first .* matches:– “ sat next to the Bobcat and listened to the “
Anchors
• More complicated patterns can be created with anchors.
• An anchor requires a pattern to match at specific places in a string.
• Allows a particular position in a pattern to align with a particular position in the string.
(^) Anchor
• (^) requires the pattern match at the beginning.
• For example,– /^Shelley/
“Shelley has red hair”
“What color is Shelley’s hair?”– /^[^!]^/
• The meaning of (^) depends on the context.
($) Anchor
• ($) requires the pattern match at the end.
• For example,– /hair$/
“Shelley has red hair”
“What color is Shelley’s hair?”
(\b) Anchor
• (\b) matches the position between a word and a non-word character.
• For example,– /\bwear\b/
“I wear shoes”
“Swimwear for sale.”
“Molly wears green sweaters.”
Binding Operators
• A pattern can be matched against any string with binding operators (=~) and (!~)
• The left operand must evaluate to a string and the return value is a Boolean.
• For example,– $string =~ /[,;:]/– $string !~ /[,;:]/– if (<STDIN> =~ /^[Yy]/) { … }
Pattern Modifiers
• A pattern can be followed by a modifier.
• The modifier changes how:– The pattern is interpreted.– The pattern matcher works while using the
pattern.
• The most common modifiers are:– i, m, s, o, x
(i) Modifier
• (i) modifier tells the pattern matcher to ignore case.
• For example, /apples/i matches– “apples”– “Apples”– “APPLES”– “ApPlEs”
(m) And (s) Modifier
• (m) treats a string as multiple lines:– (^) matches just after any newline.– ($) matcher just before any newline.
• (s) treats a string as a single line:– (.) will also match newline characters.
• If both (m) and (s) are specified:– (.) matches any character.– (^) and ($) match positions after and before a
newline
(o) Modifier
• Patterns can include scalar variables:– The variables are interpolated.
• Patterns containing variables are recompiled every time their used.
• Provides dynamic patterns, but very expensive.
• Include (o) modifier if variable never changes. – Tells Perl not to recompile the pattern.
(x) Modifier
• (x) tells the pattern matcher to ignore white spaces.
• For example, /\d+ \. \d+/x is equivalent to /\d+\.\d+/
• Allows comments to be included for patterns./\d+ # digits before the decimal. \. # The decimal point. \d+ # digits after the point./x
Remembering Matches
• Sometimes a pattern needs to reference a part of a string it matched earlier.
• Done by parenthesizing parts of interest.• Referenced by implicitly defined variables
– e.g. \1, \2, \3, …
• For example,– /(\w+).*\1/ - “jo likes joanne.”– /(.)\1/ – /([‘”])(.*?)\1/
References Outside a Pattern
• Parts of a pattern are needed outside the pattern sometimes.
• Can be referenced by implicit variables:– e.g. $1, $2, $3, …
• For example,“VY ran for 267 yards Saturday” =~
/(\d+) (\w+) (\w+)/;
print “$1 $2 $3 \n”;
Nested Parentheses
• Patterns can have nested parentheses.
• Relate to variables by counting ( starting from the left.
• For example:$_ = “31 Oct 2005”;
/((\d+) (\w+) (\d+))/;
print “$1 \n $2 $3 $4 \n”;
Backreferences
• \n and $n are called backreferences.– Refers to the result of the previous match.
• Perl also includes 3 implicit variables.– $` – part before the match.– $& – part that matched.– $’ – part after the match.
• Costly for matcher to save these for every match.
RegEx Extensions
• Perl includes several extensions to previous versions of its regular expression syntax.
• The general form is:(?xPattern)
• x is a one or two character code.
Look Ahead
• Want a pattern to match if (not) followed by a subpattern, but do not want the subpattern as part of the match.
• (?=) and (?!) provides this look ahead behavior.
• For example,– /\d+(?=\.)/– /\d+(?!\.)/
Look Behind
• Perl also allows look behinds.
• (?<=) and (?<!) provides this behavior.
• For example,– /(?<=\.)d+/– /(?<!\.)d+/
Substitution
• Often need to find a substring and replace it with another.
• Perl has a substitution operator for this.• The general form is:
– s dl Pattern dl New_string dl Modifiers
• The common form is:– s/Pattern/New_string/
• The return value is the number of substitutions made.
Examples
• Example 1:$_ = “No more apples!”;
s/apples/applets/;
• Example 2:$_ = “Who are Jack and Jill?”;
s/(\w+) and (\w+)/$2 & $1/;
Substitution with Modifiers
• Modifiers can be used with the substitution operator.
• i, o, m, s, and x have the same effect.
• There are two common modifiers for substitutions:– g: perform substitution everywhere it applies.– e: substitution part treated as a Perl expression.
(g) and (e) Examples
• Example 1:$_ = “12034005”;s/0//g;
• Example 2:$_ = “Molly and Mary were cold.”;s/(\w+)/”\1”/g;
• Example 3:$_ = “Is it Sum, SUM, sum, or suM?”;s/sum/sum/ig;
• Example 4:s/(\w+)/uc($1)/e;