Upload
janis-carr
View
214
Download
1
Embed Size (px)
Citation preview
Regular Expressions
• What is this line all about?while (!($search =~ /^\s*$/)) {
• It’s a string search just like before, but with a huge twist – regular expression search
• ^\s*$ is a regular expression that says “look for a line with nothing but white space”– Whitespace: space ( ), tab (\t), formfeed (\f),
newline (\n), carriage return (\r)
Regular Expressions
• A “convenient” way to describe patterns of characters– Characters include “printable” and “meta” characters
• Three primary concepts :– Concatenation – adjacent characters in the search string
must be adjacent in the data string– Alternation – specify a choice of characters that match in a
specified position– Repetition – specify how many of a given character must
match
Concatenation
if ($data =~ /abcdef/) {…
}• The pattern “abcdef” must show in that order
within the variable $data
Alternation
if ($data =~ /a(b|c|d|e)f/) {…
}•The pattern “a(b|c|d|e)f” must be an ‘a’ followed by one of ‘b’, ‘c’, ‘d’, ‘e’, followed by a ‘f’ within the variable $data
Repetition
if ($data =~ /ab*f/) {…
}•The pattern “ab*f” must be an ‘a’ followed by zero or more ‘b’, followed by a ‘f’ within the variable $data•* – zero or more instances of the previous character•+ – one or more instances of the previous character •{n} – exactly n instances of the previous character•{m,n} – m or m+1, … , n instances of the previous character•{n,} – n or more instances of the previous character•? – zero or one instances of the previous character
Meta-characters
• Anything following a \• Alternation (choice) |• Grouping within ( and )• Character classes within [ and ]
– e.g. [A-Za-z] all upper and lower case letters– e.g. [abc] a or b or c – same as (a|b|c)– e.g. [^0-9] anything that is not a digit 0 thru 9
• Match any– . (the dot) matches all characters. e.g. [.*] zero or more of
any character
Meta-characters
• Beginning and end of a string– ^ what follows must start the string– $ what follows must end the string– /^ matches the ^– /$ matches the $
Character Classes
• Use square brackets to denote classes (sets) of characters to be matched[A-Z] match any single uppercase letter[a-z] match any single lower case letter[0-9] match any digit[A-Za-z0-9] match any single letter or digit[^0-9] match any single character that is NOT a digit
• Note that there is no spaces in the classes (unless you want to match a space)
Matching
• String matching assumes the longest possible string to formulate the matche.g. “hear ye hear ye” =~ /hear.*ye/ matches the entire string
• If you want the minimal string you must do the followinge.g. “hear ye hear ye” =~ /hear.*?ye/ matches only the first “hear ye”