Upload
mesut-guenes
View
366
Download
1
Embed Size (px)
Citation preview
Regular expressions are
patterns
used to match
character combinations
in strings[1].
What it is?
1956: Stephen Cole Kleene, Regular
Language
1968: Ken Thompson, Pattern Matching,
Text editor
1970: Bell Labs, in Unix
1980: Henry Spencer, PERL
1992: POSIX.2 (UNIX Shell), Many
languages [2]
History
Hi,
I called Jon on Tuesday, March 25th at 7pm
and expressed a concern about my slow times
accessing www.cnn.com. He said he would fix
it, but I never heard back. Can someone contact
me at [email protected] ASAP? What does
Ctrl-F5 mean, by the way?
Thanks
Kellie
Human Brain VS Text Processing
(Hi|Hello)\,w{1,}(Regards|Thanks)
I\s(verb|auxiliary)(*)
March\s\d(st|nd|rd|th)
www\.\w{1,}\.(com|net|edu|…)
Patterns?
/pattern/options
Regex syntax
^ $ . | { } [ ] ( ) * + ? \Literal Characters
(metacharacters)
provide a list of potential
matching characters at a
position in the search
text
Square Brackets
7[Pp][Mm]
more examples
Square Brackets
7[Pp][Mm]
[123456789][aApP][Mm]
[1-9][aApP][Mm]
provide characters cannot enter
to regex
Non-Printable Characters
\n - Matches a new line; Windows \r\n
\t - Matches a tab character.
\b - Matches a backspace (when used between brackets)
\a - Matches the bell character.
\r - Matches a carriage return.
\f - Matches Form feed.
\v - Matches a vertical tab.
Euro € - \u20AC
British pound £ - \u00A3
Yen ¥ -\u00A5
Dollar sign $ - \$ or \u0024 or \x24
\cX - Matches an ASCII control character, such as \cC is Ctrl-C.
provide list of
excludation
Negation
[^0-9A-F]
[^a-zA-Z0-9_] negative of \w (or
\W)
repetition of
characters
Curly Brakets
{n} : “n” times.
{n,} : At least “n” times, but no upper
limit.
{n,m} : Between “n” and “m” times.
repetition
characters
Quantifier Symbols
Quantifier Matches Same as
? Match zero or one time {0,1}
* Match zero or more times {0, }
+ Match one or more times {1, }
define the string
boundaries
Starting and Ending Pattern
^ : starting string, not inside []
$ : end of string
provides
alternatives
Alternation
(x|y|z)
(www|ftp)
www\.\w{1,}\.(net|com|org|edu)
(x|y|z) vs [xyx]
Alternation
(x|y|z) : can be used for string
[xyz][a-A0-9] : one character or
list of characters
(Regex|ReGex) - Re[gG]ex
.Any single character
[abc]A single character: a, b, or c
[^abc]Any single character but a, b, or c
[a-z]Any single character in the range
a-z
[a-zA-Z]Any single character in the range a-z or
A-Z
^Start of line
$End of line
\AStart of string
\zEnd of string
\sAny whitespace character
\SAny non-whitespace character
\dAny digit
\DAny non-digit
\wAny word character (letter, number,
underscore)
\WAny non-word character
\bAny word boundary character
(...)Capture everything enclosed
(a|b)a or b
iCase insensitive option.
xignore whitespace in regex
(? (name)
<pattern>)Grouping
(?: <pattern>)Non-Capturing Group
check if the pattern follows by
another
Look Ahead
(?=<pattern>) : positive look ahead
(?!<pattern>) : negative look ahead
(?<city>\w+)[, ]+(?= NJ|PA|DE)
check if the pattern precede by
another
Look Behind
(?<=<pattern>) : positive look ahead
(?<!<pattern>) : negative look ahead
(?<=\"state\":)[
].*(?<state>PA|Pennsylvania)
EXAMPLES
^(?!.*(?:<|>|&|’|"|%|;|-|\+|\(|\)|\s)).{6,20}$
password
should be 6 to 20 characters length
and
not include the followings:
< > & ’ ” % ; - + ( )
Let’s Dig-into Pattern
English Rule Regex Pattern
BEGINNING of the string ^
Start of NEGATIVE LOOKAHEAD (?!
Multiple any word except newline, with QUANTIFIER .*
Start of NON-CAPTURING group (?:
Single CHARACTER with ALTERNATION <|
More single CHARACTER with ALTERNATION>| &| ‘| “| %| ;| -| \+| \(|
\)| \s
Repetition with boundaries {6,20}
END string $
^(?!.*(?:<|>|&|'|"|%|;|-|\+|\(|\)|\s)).{6,20}$
ack '(?<=\"GET\")[,]\"\/nike.*'
unix shellFind all “GET” requests to “nike” in all .csv files:
~/Downloads ls *.csv | wc -l
109
~/Downloads ack '(?<=\"GET\")[,]\"\/nike.*' | wc -l
88
~/Downloads cat web_3000:25.csv | grep '\/nike.*'
"GET","/arama/nike",7,0,140,665,101,3797,196168,0.09
"GET","/kampanya/arama/nike",8,0,270,678,229,2641,164205,0.11
"GET","/nike/295/morhipo-ozel",2,0,81,88,81,95,121609,0.03
"GET","/nike/markalar/503/32026/marka?fh=discount_rate_catalog01]
BDD - Cucumber
^/(Questions|Sorular|پرسش)*/$
Thanks
Reference:
[1] https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions
[2] https://en.wikipedia.org/wiki/Regular_expression
[3] Regular Expression Succinctly, Syncfusion, by Joe Both
[4] http://www.slideshare.net/adamlowe/regex-cards-powerpoint-format
[5] https://regex101.com
Mesut Güneş
www.testrisk.com