47
REGEX Regular Expression Mesut Güneş www.testrisk.com

Regular Expression (Regex) Fundamentals

Embed Size (px)

Citation preview

Page 1: Regular Expression (Regex) Fundamentals

REGEXRegular Expression

Mesut Güneş

www.testrisk.com

Page 2: Regular Expression (Regex) Fundamentals

Regular expressions are

patterns

used to match

character combinations

in strings[1].

What it is?

Page 3: Regular Expression (Regex) Fundamentals

1956: Stephen Cole Kleene, Regular

Language

1968: Ken Thompson, Pattern Matching,

Text editor

1970: Bell Labs, in Unix

1980: Henry Spencer, PERL

1992: POSIX.2 (UNIX Shell), Many

languages [2]

History

Page 4: Regular Expression (Regex) Fundamentals

Hi,

I called Jon on Tuesday, March 25th at 7pm

and expressed a concern about my slow times

accessing www.cnn.com. He said he would fix

it, but I never heard back. Can someone contact

me at [email protected] ASAP? What does

Ctrl-F5 mean, by the way?

Thanks

Kellie

Human Brain VS Text Processing

Page 5: Regular Expression (Regex) Fundamentals

Hi, …., thanks

I called …

March 27th

ww.blabla.com

Patterns?

Page 6: Regular Expression (Regex) Fundamentals

(Hi|Hello)\,w{1,}(Regards|Thanks)

I\s(verb|auxiliary)(*)

March\s\d(st|nd|rd|th)

www\.\w{1,}\.(com|net|edu|…)

Patterns?

Page 7: Regular Expression (Regex) Fundamentals

/pattern/options

Regex syntax

Page 8: Regular Expression (Regex) Fundamentals

^ $ . | { } [ ] ( ) * + ? \Literal Characters

(metacharacters)

Page 9: Regular Expression (Regex) Fundamentals

provide a list of potential

matching characters at a

position in the search

text

Square Brackets

7[Pp][Mm]

Page 10: Regular Expression (Regex) Fundamentals

more examples

Square Brackets

7[Pp][Mm]

[123456789][aApP][Mm]

[1-9][aApP][Mm]

Page 11: Regular Expression (Regex) Fundamentals

provide characters cannot enter

to regex

Non-Printable Characters

\n - Matches a new line; Windows \r\n

\t - Matches a tab character.

\b - Matches a backspace (when used between brackets)

\a - Matches the bell character.

\r - Matches a carriage return.

\f - Matches Form feed.

\v - Matches a vertical tab.

Euro € - \u20AC

British pound £ - \u00A3

Yen ¥ -\u00A5

Dollar sign $ - \$ or \u0024 or \x24

\cX - Matches an ASCII control character, such as \cC is Ctrl-C.

Page 12: Regular Expression (Regex) Fundamentals

provide list of

excludation

Negation

[^0-9A-F]

[^a-zA-Z0-9_] negative of \w (or

\W)

Page 13: Regular Expression (Regex) Fundamentals

repetition of

characters

Curly Brakets

{n} : “n” times.

{n,} : At least “n” times, but no upper

limit.

{n,m} : Between “n” and “m” times.

Page 14: Regular Expression (Regex) Fundamentals

repetition

characters

Quantifier Symbols

Quantifier Matches Same as

? Match zero or one time {0,1}

* Match zero or more times {0, }

+ Match one or more times {1, }

Page 15: Regular Expression (Regex) Fundamentals

define the string

boundaries

Starting and Ending Pattern

^ : starting string, not inside []

$ : end of string

Page 16: Regular Expression (Regex) Fundamentals

provides

alternatives

Alternation

(x|y|z)

(www|ftp)

www\.\w{1,}\.(net|com|org|edu)

Page 17: Regular Expression (Regex) Fundamentals

(x|y|z) vs [xyx]

Alternation

(x|y|z) : can be used for string

[xyz][a-A0-9] : one character or

list of characters

(Regex|ReGex) - Re[gG]ex

Page 18: Regular Expression (Regex) Fundamentals

.Any single character

Page 19: Regular Expression (Regex) Fundamentals

[abc]A single character: a, b, or c

Page 20: Regular Expression (Regex) Fundamentals

[^abc]Any single character but a, b, or c

Page 21: Regular Expression (Regex) Fundamentals

[a-z]Any single character in the range

a-z

Page 22: Regular Expression (Regex) Fundamentals

[a-zA-Z]Any single character in the range a-z or

A-Z

Page 23: Regular Expression (Regex) Fundamentals

^Start of line

Page 24: Regular Expression (Regex) Fundamentals

$End of line

Page 25: Regular Expression (Regex) Fundamentals

\AStart of string

Page 26: Regular Expression (Regex) Fundamentals

\zEnd of string

Page 27: Regular Expression (Regex) Fundamentals

\sAny whitespace character

Page 28: Regular Expression (Regex) Fundamentals

\SAny non-whitespace character

Page 29: Regular Expression (Regex) Fundamentals

\dAny digit

Page 30: Regular Expression (Regex) Fundamentals

\DAny non-digit

Page 31: Regular Expression (Regex) Fundamentals

\wAny word character (letter, number,

underscore)

Page 32: Regular Expression (Regex) Fundamentals

\WAny non-word character

Page 33: Regular Expression (Regex) Fundamentals

\bAny word boundary character

Page 34: Regular Expression (Regex) Fundamentals

(...)Capture everything enclosed

Page 35: Regular Expression (Regex) Fundamentals

(a|b)a or b

Page 36: Regular Expression (Regex) Fundamentals

iCase insensitive option.

Page 37: Regular Expression (Regex) Fundamentals

xignore whitespace in regex

Page 38: Regular Expression (Regex) Fundamentals

(? (name)

<pattern>)Grouping

Page 39: Regular Expression (Regex) Fundamentals

(?: <pattern>)Non-Capturing Group

Page 40: Regular Expression (Regex) Fundamentals

check if the pattern follows by

another

Look Ahead

(?=<pattern>) : positive look ahead

(?!<pattern>) : negative look ahead

(?<city>\w+)[, ]+(?= NJ|PA|DE)

Page 41: Regular Expression (Regex) Fundamentals

check if the pattern precede by

another

Look Behind

(?<=<pattern>) : positive look ahead

(?<!<pattern>) : negative look ahead

(?<=\"state\":)[

].*(?<state>PA|Pennsylvania)

Page 42: Regular Expression (Regex) Fundamentals

EXAMPLES

Page 43: Regular Expression (Regex) Fundamentals

^(?!.*(?:<|>|&|’|"|%|;|-|\+|\(|\)|\s)).{6,20}$

password

should be 6 to 20 characters length

and

not include the followings:

< > & ’ ” % ; - + ( )

Page 44: Regular Expression (Regex) Fundamentals

Let’s Dig-into Pattern

English Rule Regex Pattern

BEGINNING of the string ^

Start of NEGATIVE LOOKAHEAD (?!

Multiple any word except newline, with QUANTIFIER .*

Start of NON-CAPTURING group (?:

Single CHARACTER with ALTERNATION <|

More single CHARACTER with ALTERNATION>| &| ‘| “| %| ;| -| \+| \(|

\)| \s

Repetition with boundaries {6,20}

END string $

^(?!.*(?:<|>|&|'|"|%|;|-|\+|\(|\)|\s)).{6,20}$

Page 45: Regular Expression (Regex) Fundamentals

ack '(?<=\"GET\")[,]\"\/nike.*'

unix shellFind all “GET” requests to “nike” in all .csv files:

~/Downloads ls *.csv | wc -l

109

~/Downloads ack '(?<=\"GET\")[,]\"\/nike.*' | wc -l

88

~/Downloads cat web_3000:25.csv | grep '\/nike.*'

"GET","/arama/nike",7,0,140,665,101,3797,196168,0.09

"GET","/kampanya/arama/nike",8,0,270,678,229,2641,164205,0.11

"GET","/nike/295/morhipo-ozel",2,0,81,88,81,95,121609,0.03

"GET","/nike/markalar/503/32026/marka?fh=discount_rate_catalog01]

Page 46: Regular Expression (Regex) Fundamentals

BDD - Cucumber

Page 47: Regular Expression (Regex) Fundamentals

^/(Questions|Sorular|پرسش)*/$

Thanks

Reference:

[1] https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions

[2] https://en.wikipedia.org/wiki/Regular_expression

[3] Regular Expression Succinctly, Syncfusion, by Joe Both

[4] http://www.slideshare.net/adamlowe/regex-cards-powerpoint-format

[5] https://regex101.com

Mesut Güneş

www.testrisk.com