47
www.edureka.co/mastering-perl-scripting Mastering REGEX in Perl

Mastering Regex in Perl

  • Upload
    edureka

  • View
    184

  • Download
    5

Embed Size (px)

Citation preview

www.edureka.co/mastering-perl-scripting

Mastering REGEX in Perl

Slide 2 www.edureka.co/mastering-perl-scripting

What is Perl

Benefits of Perl

Advantages of using Perl scripting

Starting Perl by writing the first script,

Uses of Regular Expression,

grep functions

At the end of this module, you will be able to

Objectives

Slide 3 www.edureka.co/mastering-perl-scripting

Hi there!My name is Jose, I’m a computerconsultant, techie and trainer. Studentsusually come to me and ask which computerlanguage they should use in their projectand why.I’m here to help

Meet Mr. Jose

Slide 4 www.edureka.co/mastering-perl-scripting

Hi There!My name is Han, I’m Quality Analyst and my managerasked me to automate the tasks. I’m confused whichlanguage to use as I have tight deadlines and want tomake automation generic. I am here to meet Mr. Joseand wanted to know which language should I use forautomation

Meet Mr. Han

Slide 5 www.edureka.co/mastering-perl-scripting

Hi Jose, I work for investmentbank. My manager asked me toautomate all my tasks. On adaily basis I interact withmillions of shares. I’m confusedwhich language should I use

Hi Han, seems you need tointeract with data and wheneverthe huge data processing comesto your mind Perl is the mostsuitable computer language

Han is Confused!

Slide 6 www.edureka.co/mastering-perl-scripting

Perl is one of the most popular open source interpreted programming language with a huge number of programmers, libraries and resources

Perl has very powerful inbuilt regular expressions which often is the important reason when people decide to use Perl for bulk text processing

Perl is platform independent and also used to generate html pages

Similar to Python, PHP but, with very powerful and flexible features

Inbuilt regular expression provides data filter and data transformation

Perl is nicknamed "the Swiss Army chainsaw of scripting language" due to its flexibility and power

What is Perl?

Slide 7 www.edureka.co/mastering-perl-scripting

What are the Benefit of using Perl?

» Perl has relatively few keywords, simple structure, and a clearly defined syntaxEasy-to-learn

» Perl can run on a wide variety of hardware platforms and has the same interface on all platforms

Portable

» Perl provides interfaces to all major commercial databases» CPAN an archive of Perl library consist more than 20K modules

Databases

» One of Perl's greatest strengths is the bulk of the library is very portable and cross-platform compatible on UNIX, Windows and Mac OS

Standard Library

» Automatic memory management » Automatic garbage collection

Memory Management

» High-level data types and operations» Object-oriented programming» Easy Debugging Techniques» Scalability

Others Benefits

Slide 8 www.edureka.co/mastering-perl-scripting

About Perl

Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to makereport processing easier. Since then, it has undergone many changes and revisions

Perl is not an official acronym but people say it is derived from Practical Extraction and Report Language

As per the saying, frustrations of Unix shell programming led directly to the creation of Perl

It is an open source and interpreted language

Considered a scripting language, but is much more than that

Scalable, Object Oriented and Functional

Used by many Fortune 500 organizations

Simply, there is nothing which Perl cannot do

Slide 9 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment

» Perl has the relatively less keywords and they are many ways to do the similar thing is aphilosophy of Perl

Why Perl?

Slide 10 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment

» Perl code is typically one-third to one-fifth the size of equivalent C++ or Java code. Thatmeans there is less to type, less to debug, and less to maintain

Why Perl?

Slide 11 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment

» Perl programs run unchanged on all major computer platforms. For Example- Windows,Linux, MAC OS etc.

Why Perl?

Slide 12 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment

» Perl comes with a large collection of prebuilt and portable functionality, known as theStandard modules. These modules supports an array of application-level programmingtasks, from text pattern matching to network scripting

Why Perl?

Slide 13 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment

» Perl scripts can easily communicate with other parts of an application, using a variety ofintegration mechanisms

Why Perl?

Slide 14 www.edureka.co/mastering-perl-scripting

Less Restrictions

Developer Productivity

Program Portability

Support Libraries

Component Integration

Enjoyment » Because of the ease of use and built-in toolset, Perl makes the programming morepleasurable

Why Perl?

Slide 15 www.edureka.co/mastering-perl-scripting

Users and Perl Projects

» Yahoo uses Perl in many of website development and data processing

» SpamAssassin is the well known SPAM filter software. It is part ofthe Apache Software Foundation

» CiderWebmail, is an opersource product written in Perl and AJAX

Slide 16 www.edureka.co/mastering-perl-scripting

Users and Perl Projects (Contd.)

» Twiki is one of the best-known wiki software with an orientation tosupport companies. It is built primarily by the company with the samename that also provides cloud-based hosted Twiki service

» Bugzilla is the well known bug-tracking system developed by and forMozilla. It is used in quite a lot of companies

Slide 17 www.edureka.co/mastering-perl-scripting

Traditional Uses of Perl

Internet ScriptingSystem Utilities

Web Scraping Database Programming Ad Targeting

Text Processing

Slide 18 www.edureka.co/mastering-perl-scripting

Traditional Uses of Perl (Contd.)

Request

Result

NetworkCRM

LOBERP

ETL

Data Warehouse

ETL Processing Network Programming

Slide 19 www.edureka.co/mastering-perl-scripting

Write a First Program

We can use any editor to create a scripts on Windows and vi editor on Linux

The extension of the script is .pl

Perl executable statements end with semicolons (;)

Perl is case-sensitive

Free form – whitespaces are ignored

Comment begin with # (pound sign) – may be anywhere, not just beginning of line

Perl also support multiline comment through POD (Plain Old Documentation)

Using POD we can add the documentation in the scripts, these statements are not treated as executable statements

__END__ is one of the special literal which is the logical end of the program

Slide 20 www.edureka.co/mastering-perl-scripting

Write a First Program (Contd.)

To execute the script, invoke the script using perl <script name>

For LINUX users – you can execute the script while adding the she bang line (the interpreter address at the very first line of the script) to make them self executable

Example

D:\Edureka > perl helloWorldDemo.pl

D:\Edureka > helloWorldDemo.pl

Slide 21 www.edureka.co/mastering-perl-scripting

Regular Expression is a set of characters together form the search pattern

Main use of regular expression is to match patterns in any string forms

The other use of regular expression ‘find and replace feature’

Regular expression forms the generic pattern for the string matching with the help of pre-defined wildcard characters

Many language provide regular expression capabilities, some language have it inbuilt and other are having regular expression libraries

Regular expression is also known by regex or regexp

In Perl regex is inbuilt, hence it is pretty good in performance

What is Regular Expression?

Slide 22 www.edureka.co/mastering-perl-scripting

Real World – Regular Expression

I wish, if I could have the software which filter all the

phone call starting with +140

Slide 23 www.edureka.co/mastering-perl-scripting

Match Operator

We have match operator which matches the regex available in the string

=~ (assignment operator followed by tilda operator is use for regex matching)

!~ (Negation operator followed by tilda operator is use for regex un-matching)

~ operator after assignment operator perform the regex matching, REGEX are case sensitive, m character in matching regex is optional

Slide 24 www.edureka.co/mastering-perl-scripting

A Word Match – Example

Output

Example

Slide 25 www.edureka.co/mastering-perl-scripting

The First Wildcard

Wildcards (are also called as quantifiers) are the operator symbols which have specific meaning inside regular expression

For example: . (Dot or period) matches any character, digit, alphanumeric character except newline character (\n).

Slide 26 www.edureka.co/mastering-perl-scripting

Match Operator itself

In many cases, user may wants to match the operator symbol itself in the regular expression. We can suppress the wild cards and special characters itself by backslash (\)

Output

Example

Slide 27 www.edureka.co/mastering-perl-scripting

Capturing and Grouping

Perl regex remember a group of strings which being the part parentheses in the regular expression

Inside regex, these groups are refer by back references. They are \1, \2,\3 and so on..

Outside regex, these groups are refer by special variable $1, $2, $3 and so

These groups can also be fetched by variables assignment in list context called as capturing

Slide 28 www.edureka.co/mastering-perl-scripting

Back References Example

Output

Example

Slide 29 www.edureka.co/mastering-perl-scripting

Substitution

The another Perl operator that uses regular expressions allows us to provide find and replace feature

Regex are Greedy, means it will try to match as much it can!

This is called as substitution

Slide 30 www.edureka.co/mastering-perl-scripting

Modifier ‘i’ and ‘g’

‘i’ modifier make the REGEX case insensitive

‘g’ modifier is for global search

Slide 31 www.edureka.co/mastering-perl-scripting

Modifier ‘s’ and ‘m’

‘m’ modifier ^ and $ match more than once inside a string.

‘s’ modifier make . to match \n as well

Slide 32 www.edureka.co/mastering-perl-scripting

Modifier ‘x’

‘x’ modifier white spaces in the REGEX are ignored. This modifier is used for clean syntax

Slide 33 www.edureka.co/mastering-perl-scripting

Greedy Property of REGEX Wildcards

Whenever Perl REGEX sees '*' or '+‘ or ‘?’ or {a,b} it will matches as much as it can

This property is greedy property of regex wildcards

Sometimes it’s an issue as substitute replace the matched string

Slide 34 www.edureka.co/mastering-perl-scripting

Other Wildcards

These wildcard characters do not matches themselves. Until and unless they suppressed by backslash

Following are the other wildcards:

Wildcard Meaning

* matches Zero or more occurrence of previous character/s

+ matches One or more occurrence of previous character/s

? matches Zero or One occurrence of previous character/s

Slide 35 www.edureka.co/mastering-perl-scripting

Wildcards Examples

REGEX Matches

AbC*It matches A followed by b followed by either Zero or more occurrence of C. i.e. Ab, AbC,

AbCCCC, AbCCCCCCCCCCCC

AbC+It matches A followed by b followed by minimum one or more occurrence of C i.e. AbC,

AbCCCCCCCC, AbCCC

AbC? It matches A followed by b followed by one or Zero occurrence of C. i.e. Ab, AbC

Ab(cd)*It matches A followed by b followed by either Zero or more occurrence of cd i.e. Ab, Abcd,

Abcdcd

Ab(cd)+It matches A followed by b followed by minimum one or more occurrence of cd i.e. Abcd,

Abcdcd

Ab(cd)? It matches A followed by b followed by either one or zero occurrence of cd i.e. Abcd, Ab

Slide 36 www.edureka.co/mastering-perl-scripting

Combine Multiple Wildcards

REGEX Matches

Ab+C*It matches A followed by minimum one or more occurrence of b followed by either Zero or more

occurrence of C. i.e. Ab, AbC, AbbCCC. AbbCCCCCCCCCC

A.C+It matches A followed by any character followed by minimum one or more occurrence of C i.e. AZC,

AzCCC. AECCCCCCCCCCC

..C? It matches any two characters followed by b followed by one or Zero occurrence of C. i.e. Ab, AbC

<.*> It matches anything inside tags <> i.e. <HTML>, <TAGS>

\( .+\ ) It matches minimum one character inside brackets cd i.e. (Abcd), (a)

ab+c? It match a followed by one or more b followed by zero or one c. i.e. "abbbbc" or "abc", but not "ac"

Slide 37 www.edureka.co/mastering-perl-scripting

Character Class in Regex

Character class is the set of any characters, digits or alphanumeric characters

While using the character class in Regex, it says any single character from the set

In character class we put a list of the characters in set inside square brackets like:

REGEX Matches

[abc] It matches any string which has either ‘a’ or ‘b’ or ‘c’

[abcdefghijklmnopqrstuvwxyz] It matches any string which has either ‘a’ or ‘b’ or ‘c’ or so on till ‘z’

[a-z] It matches any string which has either ‘a’ or ‘b’ or ‘c’ or so on till ‘z’

[0-9] It matches any string which has 0 or 1 or 2 or 3 till 9

[a-zA-Z0-9] It matches any string which has characters from a-z and A-Z and 0-9

[a-z_] It matches any string which has characters from a-z or _ (underscore)

Slide 38 www.edureka.co/mastering-perl-scripting

Negate the Character Class

^ (carat) symbol inside character class is used to negate the character class in regex

If we put the carat within the character class in Regex, it says none of the single character from the set

Here are few examples:

REGEX Matches

[^abc] It matches any string which has neither ‘a’ nor ‘b’ nor ‘c’

[^abcdefghijklmnopqrstuvwxyz] It matches any string which has neither ‘a ‘ or ‘b’ or ‘c’ or so on till ‘z’

[^a-z] It matches the string which has neither ‘a’ or ‘b’ or ‘c’ or so on till ‘z’

[^aeiou] It matches the string which has no vowels

[lL][^abc] It matches the string has ‘l’ or ‘L’ should not followed by ‘a’ nor ‘b’ nor ’c’

[^a-z_] It matches the string doesn’t have a-z or _ (underscore)

Slide 39 www.edureka.co/mastering-perl-scripting

Combine Character Class with Wildcards

REGEX Matches

[aA][0-9]+It matches any string which has ‘a’ or ‘A’ followed by any number and occurrence can any

number of times

A+.[.?] It matches any string which has ‘A’ any number of times followed by any character followed by

either ‘.’ or ‘?’

a[bc] It matches any string which has ‘a’ followed by either ‘b’ or ‘c’

A[abc]? It matches the string which has ‘A’ followed by zero or one occurrence of either ‘a’ or ‘b’ or ‘c’

[a-z_.]\@ It matches the string has ‘a’ to ‘z’ or ‘_’ or ‘.’ followed by ‘@’

Slide 40 www.edureka.co/mastering-perl-scripting

Character Class - Shortcuts

Character classes can also be represent by shortcuts

Following are the examples:

Shortcut Say Meaning

\s Any space, tab or new line characters [ \t\n]

\S Other than space, tab or newline character [^\t\n]

\d Any digit [0-9]

\D Other than digit [^0-9]

\w Digits, characters or _ (underscore) [a-zA-Z0-9_]

\W Other than digit, character or _ [^a-zA-Z0-9_]

Slide 41 www.edureka.co/mastering-perl-scripting

Shortcuts with Wildcards

Shortcuts can also be used with wildcards

Following are the examples:

Shortcut Say Meaning

\s+Any number of space, tab or new line

characters[ \t\n]+

\S+Other than space, tab or newline character

any number of times[^\t\n]+

\d+ Any digit any number of times [0-9]+

\D+ Other than digit any number of times [^0-9]+

\w+Digits, characters or _ (underscore) any

number of times[a-zA-Z0-9_]+

\W+Other than digit, character or _ any

number of times

[^a-zA-Z0-

9_]+

Slide 42 www.edureka.co/mastering-perl-scripting

Meta Characters

Shortcut Say Meaning

^ ^ful Should start with a string i.e. matches ‘ful’ but not ‘wonderful’

$ ful$ Should be ended with a string i.e. matches ‘wonderful’ but not the word ‘Fultron’

{a,b}Abc{1,2}

Abc{1}

It matches the string has ‘A’ followed by ‘B’ followed by minimum one occurrence if ‘c’

and maximum 2 occurrence.

(…) (\w+) Grouping will be discuss in later slides

\ \? Backspace – suppress the special meaning of quantifiers.

| Black|white Black or White in the string

Slide 43 www.edureka.co/mastering-perl-scripting

Meta Symbols

Shortcut Say Meaning

\A \Aful Should start with a string i.e. matches ‘ful’ but not ‘wonderful’

\Z Ful\Z Should be ended with a string i.e. matches ‘wonderful’ but not the word ‘Fulltron’

\cA \cA Match control A, \cB Match control B

\Q and \E \Q What is your name?\E Quotes the meta characters till \E (? Is question mark here not the quantifier)

\b \bful\b Looks for exact word ful

\B \BFul\B Opposite of \b

\n and \t \t Match the \n (new line) and \t(tab character)

Slide 44 www.edureka.co/mastering-perl-scripting

All in One Example

REGEX Matches

/full/ Matches ‘full’, ‘Wonderful’ and ‘Fultron’

/Ao+/ Matches Ao, Aoo, Aoooo

/A(oh)*/ Matches A, Aoh, Aohoh

/Yahoo{1,3}/ Matches Yahoo, Yahooo, Yahoooo

/Edurekas?/ Matches Edureka, Edurekas

/Check\s+mates/ Matches Check followed by spaces followed mates

/\$10/ Matches $10, $100, $101

Slide 45 www.edureka.co/mastering-perl-scripting

All in One Example

REGEX Matches

/full/ Matches ‘full’, ‘Wonderful’ and ‘Fultron’

/Ao+/ Matches Ao, Aoo, Aoooo

/A(oh)*/ Matches A, Aoh, Aohoh

/Yahoo{1,3}/ Matches Yahoo, Yahooo, Yahoooo

/Edurekas?/ Matches Edureka, Edurekas

/Check\s+mates/ Matches Check followed by spaces followed mates

/\$10/ Matches $10, $100, $101

Slide 46

Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!

Please spare few minutes to take the survey after the webinar.

www.edureka.co/mastering-perl-scripting

Survey

Slide 47 Course Url