22
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Embed Size (px)

Citation preview

Page 1: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

7 Searching and Regular Expressions (Regex)

Mauro Jaskelioff

Page 2: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Introduction

• Shell metacharacters – What are they?– Why they are not the same as regular

expressions!• More about regular expressions

– Searching file contents using:• grep• egrep• fgrep

Page 3: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Shell Metacharacters

Page 4: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Shell Metacharacters

• Special characters are characters that have some meaning to the shell

• Also known as metacharacters• They are interpreted by the shell for

expansion unless they are quoted or escaped (more on this later)

• E.g.: $ file ../*(gives the file type for all files in the directory

one level up)

Page 5: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Filename Expansion

• The * metacharacter matches multiple files. It means any string of zero or more characters. Eg.:– *.txt matches any filename ending in .txt– myfile.* matches all files with a prefix of myfile

and any suffix– *.* matches files with any prefix and suffix– * matches all files– UST/* matches all files in the UST directory– .* matches all hidden files– *ology matches all filenames with ology at the

end (or a filename of just ology ☺)

Page 6: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Filename Expansion (2)

• The previous example:$ file ../*

1. The shell expands the metacharacters in the command line$ file ../file1 ../file2 /file3

2. The command is executed. • Commands don’t interpret shell

metacharacters• The interpretation is done by the

shell

Page 7: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Other Filename Metacharacters

• ? matches any single character• [abc…] matches any of the enclosed

characters. A hyphen can be used to specify a range, e.g. a-z

• [!abc…] matches any character not enclosed

Page 8: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Command substitution• The shell also supports substituting the

output of a command$ ls –l `cat filenames`

• The command should be enclosed in backquotes (`)

[zlizmj@unnc-cslinux ~]$ cat filenamestemptemp2[zlizmj@unnc-cslinux ~]$ ls -l `cat filenames`-rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp-rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2[zlizmj@unnc-cslinux ~]$ ls -l temp temp2-rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp-rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2[zlizmj@unnc-cslinux ~]$

Page 9: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Avoiding Shell Expansion• What happens if we actually want to pass a

metacharacter to the command? (i.e. we don’t want the shell to interpret it as a

metacharacter)

• For example, me may have a file named temp*

• The character needs to be quoted or escaped– We can quote an argument with single quotes

(’) or with double quotes (”)– We escape characters with the backslash

character (\)

Page 10: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Single or Double Quotes?

• ″ ″– everything between ″ and ″ is taken literally,

except for:• $ - variable substitution will occur• ` - command substitution will occur• ″ - marks the end of the double quote• ’ – doesn’t have special meaning

• ′ ′– everything between ′ and ′ is taken literally

except for another ′. – You cannot embed another ′ within such a

quoted string (unless you escape it)

Page 11: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Escaping a Character

• The character following a backslash \ is taken literally. $ echo I\’m MauroI’m Mauro$

• Use \ within ″ ″ or ’ ’ to escape ″, $, and ′ when necessary.

• How to escape \?

Page 12: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Regular Expressions

Page 13: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Regular Expressions

• Also called regex• For describing a set of strings using a

pattern– Follows a set of rules– Used for finding occurrences of strings in files

• Contain normal characters mixed with special characters (called metacharacters)

• These metacharacters are NOT the same as shell metacharacters which are used for filename expansion!

Page 14: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Regular Expressions

• Regular Expressions must be put inside quotes otherwise the shell will interpret metacharacters for filename expansion

• E.g.:– grep ‘[Ff]red’ myfile.txt – Searches the file myfile.txt for lines

containing either Fred or fred

Page 15: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Fixed Patterns vs. Regular Expressions

• To search a file for the word computer:– grep computer myfile.txt– Will only match the word computer– A fixed pattern not a regular expression

• Supposing we want to find occurrences (including potential misspellings) of:– computer, computor, Computer, Computor,

Computers, and so on…– grep ‘[cC]omput[eo]rs*’ myfile.txt– Uses a regular expression

Page 16: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Three versions of grep

• grep: supports for the most common metacharacters.

• egrep: (extended grep) supports extended set of metacharacters. It’s more expressive but may be slower.

• fgrep: (fast grep) doesn’t support metacharacters. It’s less expressive but faster.

Page 17: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Regex Metacharacters. Matches any single character except newline c.t matches cat, cbt, cct …

[ ] Matches one character between [ and ] [abc] matches a, b or c

- Indicates a range a-z matches all characters from a to z

* Matches zero or more occurrences of the preceding character

12* matches 1, 12, 122, 1222 …

+ Matches one or more occurrences of the preceding character. NOTE: for use with egrep

12+ matches 12, 122, 1222 …

? Matches zero or one occurrence of the preceding character. NOTE: for use with egrep

12? matches 1 and 12

\ Treats the next character literally \* will match the character * and NOT the metacharacter *

^ Matches the start of the line ^Fred will match only lines that have the word Fred at the start of the line

$ Matches the end of the line Fred$ will match only lines that have the word Fred at the end of the line

Page 18: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

grep Revisited

• Used to search a file for a pattern• (remember STDIN, STDOUT, etc. are

also treated as files in UNIX)• cat myfile.txt | grep “chocolate”• who | grep zlizmj• grep ‘pingu’ penguinNames.txt• grep ‘[Ww]ib*le’ wobble.txt

Page 19: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

egrep

• Extended grep. • Slower but greater functionality• Includes additional metacharacters,

e.g.:– + matches one of more of it’s preceding

character. • E.g. abc+ means abc, abcc, abccc, …

– ? matches zero or one of it’s preceding character.

• E.g. abc? means ab or abc

– | an alternative. • E.g. A | B means A or B

Page 20: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

egrep Example

• egrep ‘(bio|geo)logy’ subjects.txt– will search the file subjects.txt for all

lines that contain the words biology or geology

Page 21: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

fgrep

• Fast grep• Does not use regular expressions

– Used for matching an exact string, not a pattern

– $, *, [, ^, |, (, ), and \ are interpreted literally

– (but still have special meaning to the shell)

– Enclose entire string in quotes

Page 22: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Summary

• The shell performs filename expansion and command substitution.

• Shell metacharacters are not the same as regular expressions!

• Regular expressions allow us to search for a pattern in a file

• Commands used for searching:– grep– egrep– fgrep (does not use regular expressions)