52
The awk Utility CS465 - Unix

The awk Utility

  • Upload
    foy

  • View
    104

  • Download
    4

Embed Size (px)

DESCRIPTION

CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk - PowerPoint PPT Presentation

Citation preview

Page 1: The  awk  Utility

The awk Utility

CS465 - Unix

Page 2: The  awk  Utility

Background

• awk was developed by

– Aho, Weinberger, and Kernighan (of K & R)

– Was further extended at Bell Labs

• Handles simple data-reformatting jobs easily with just a few lines of code.

• Versions

– awk - original version

– nawk - new awk - improved awk

– gawk - gnu awk - improved nawk

Page 3: The  awk  Utility

How awk works

• awk commands include patterns and actions

– Scans the input line by line, searching for lines that match a certain pattern (or regular expression)

– Performs a selected action on the matching lines

• awk can be used:

– at the command line for simple operations

– in programs or scripts for larger applications

Page 4: The  awk  Utility

Running awk

• From the Command Line:

$ awk '/pattern/{action}' file

• OR From an awk script file:

$ cat awkscript# This is a comment/pattern/ {action}$ awk –f awkscript file

Page 5: The  awk  Utility

awk’s Format using Input from a File

$ awk /pattern/ filename– awk will act like grep

$ awk '{action}' filename– awk will apply the action to every line in the file

$ awk '/pattern/ {action}' filename – awk will apply the action to every line in the file

that matches the pattern

Page 6: The  awk  Utility

Example 1Input $ cat pingfile

PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$

awk awk '/icmp/' pingfile

Output 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms

Page 7: The  awk  Utility

Example 1Input $ cat pingfile

PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$

awk awk '{print $1}' pingfile

Output PING 64 646464----dt033n32.san.rr.com PING Statistics----128round-trip

Page 8: The  awk  Utility

Example 1Input $ cat pingfile

PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$

awk awk '/icmp/ {print $5}' pingfile

Output icmp_seq=0icmp_seq=1icmp_seq=2icmp_seq=3

Page 9: The  awk  Utility

Records and Fields

• awk divides the input into records and fields– Each line is a record (by default)

field-1 field-2 field-3 | | | v v v

record 1 -> George Jones Adminrecord 2 -> Anthony Smith Accounting

– Each record is split into fields, delimited by a special character (whitespace by default)• Can change delimeter with –F or FS

Page 10: The  awk  Utility

awk field variables

• awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script).

– $1 is the first field, $2 is the second…

– $0 is a special field which is the entire line

– NF is always set to the number of fields in the current line (no dollar sign to access)

Page 11: The  awk  Utility

Example #1$ cat studentsBill White 7777771 1980/01/01 ScienceJill Blue 1111117 1978/03/20 ArtsBen Teal 7171717 1985/02/26 CompSciSue Beige 1717171 1963/09/12 Science$$ awk '/Science/{print $1, $2}' studentsBill WhiteSue Beige$

Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated):$ awk '/Science/{print $1 $2}' studentsBillWhiteSueBeige

Page 12: The  awk  Utility

Example #2

- No pattern given, so matches ALL lines- Text strings to print are placed in double quotes

$ cat phonelist

Joe Smith 774-0888

Mary Jones 772-2345

Hank Knight 494-8888

$$ awk '{print "Name: ", $1, $2, \

" Telephone:", $3}' phonelist

Name: Joe Smith Telephone: 774-0888

Name: Mary Jones Telephone: 772-2345

Name: Hank Knight Telephone: 494-8888

$

Page 13: The  awk  Utility

Example #3

$ grep small /etc/passwd

small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh

$$ awk -F: '/small000/{print $5}' /etc/passwd

Faculty - Pam Smallwood

$

Given a username, display the person’s real name:

Page 14: The  awk  Utility

awk using Input from Commands

• You can run awk in a pipeline, using input from another command:

$ command | awk '/pattern/ {action}'

– Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern

Page 15: The  awk  Utility

Piped awk Input Example

$ w | awk '/ksh/{print $1}'pugli766gibbo201nelso828$

$ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00,

0.00, 0.01User tty login@ idle JCPU PCPU whatpugli766 pts/8 Tue10pm 3days -kshlin318 pts/17 10:58am 1:45 vi choosesortsmall000 pts/18 12:43pm wmcdev712 pts/10 11:52am 14 1 vi adddatagibbo201 pts/12 12:15pm 18 -kshnelso828 pts/16 7:17pm 17:43 -ksh$

Page 16: The  awk  Utility

Relational Operators• awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value

– If the outcome of the comparison is true then the the action is performed

• Examples:

– To print every record in the log.txt file in which the second field is larger than 10

$ awk '$2 > 10' log.txt

– To print every record in the log.txt file which does NOT contain ‘Win32’

$ awk '!/Win32/' log.txt

Page 17: The  awk  Utility

Relational Operator Example

$ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)nelso828 pts/16 Jun 5 19:17 (65.100.138.177)$$ who | awk '$4 < 6 {print $1, $3, $4, $5}'pugli766 Jun 3 22:24 nelso828 Jun 5 19:17$

Page 18: The  awk  Utility

Piping awk output$ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)nelso828 pts/16 Jun 5 19:17 (65.100.138.177)$$ who | awk '$4 == 6 {print $1}' | sort

gibbo201lin318mcdev712small000$

Page 19: The  awk  Utility

awk Programming

• awk programming is done by building a list– The list is a list of rules– Each rule is applied sequentially to each line

(record)

• Example:

/pattern1/ { action1 }

/pattern2/ { action2 }

/pattern3/ { action3 }

Page 20: The  awk  Utility

awk - pattern matching

• Before processing, lines can be matched with a pattern.

/pattern/ { action } execute if line matches pattern

The pattern is a regular expression.

• Examples:

/^$/ { print "This line is blank" }

/num/ { print "Line includes num" }

/[0-9]+$/ { print "Integer at end:", $0 }

/[A-z]+/ { print "String:", $0 }

/^[A-Z]/ { print "Starts w/uppercase letter" }

Page 21: The  awk  Utility

awk program from a file

• The awk commands (program) can be placed into a file

• The –f (lowercase f) indicates that the commands come from a file whose name follows the –f

$ awk –f awkfile datafile

The contents of the file called awkfile will be used as the commands for awk

Page 22: The  awk  Utility

Example 1$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog/5?5/ {print $1, $2}/3*4/ {print $5}$

$ awk –f awkprog studentsArtsBill TealSue Beige$

**NOTE: All patterns applied to each line before moving to next line

Page 23: The  awk  Utility

Example 2$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog/Science/ {print "Science stu:", $1, $2}/CompSci/ {print "Computing stu:", $1, $2}$

$ awk –f awkprog studentsScience stu: Bill WhiteComputing stu: Bill TealScience stu: Sue Beige$

Page 24: The  awk  Utility

More about Patterns• Patterns can be:

– Empty: will match everything

– Regular expressions:

/reg-expression/

– Boolean Expressions:

$2=="foo" && $7=="bar"

– Ranges:

/jones/,/smith/

Page 25: The  awk  Utility

Example - Boolean Expressions$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog$3 <= 444444 {print "Not counted"}$3 > 444444 {print $2 ",", $1}$

$ awk –f awkprog studentsNot countedNot countedTeal, BillBeige, Sue$

Page 26: The  awk  Utility

Example - Ranges$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$

$ awk '/333333/,/555555/' studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSci$

Page 27: The  awk  Utility

More Built-In awk Variables

• Two types: Informative and Configuration

• Informative:

NR = Current Record Number (start at 1)

– Counts ALL records, not just those that match

NF = Number of Fields in the Current Record

FILENAME = Current Input Data File

– Undefined in the BEGIN block

Page 28: The  awk  Utility

Example using NF$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$$ awk '{print NF}' names

3

4

2

0

$

Page 29: The  awk  Utility

Example using a boolen, NF, and NR

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$$ awk 'NF > 2 {print NR ":", NF, "fields"}' names

1: 3 fields

2: 4 fields

$

Page 30: The  awk  Utility

Built-in awk functions

log(expr) natural logarithm

index(s1,s2) position of string s2 in string s1

length(s) string length

substr(s,m,n) n-char substring of s starting at m

tolower(s) converts string to lowercase

printf() print formatted - like C printf

Page 31: The  awk  Utility

Example 2

Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…

Program /PING/ { print tolower($1) }

/icmp/ {

time = substr($7,6,2)

print time

}

Output Ping

49

94

50

41

Page 32: The  awk  Utility

print & printf

• Use print in an awk statement to output specific field(s)

• printf is more versatile

– works like printf in the C language

– May contain a format specifier and a modifier

Page 33: The  awk  Utility

Format Specification• A format specification consists of a percent symbol, a

modifier, width and precision values, and a conversion character

• To display the third field as a floating point number with two decimal places:

awk '{printf("%.2f\n", $3)}' file

• You can include additional text in the printf statement

'{printf ("3rd value: %.2f\n", $3)}'

Page 34: The  awk  Utility

Specifiers, Width, Precision, & Modifiers

• Type Specifiers:%c Single character%d integer (decimal)%f Floating point%s String

• Between the % and the specifier you can place the width and precision%6.2f means a floating

point number in a field of width 6 in which there are two decimal places

• Modifiers control details of appearance:- minus sign is the left

justification modifier right justification)

+ plus sign forces the appearance of a sign (+,-) for numeric output

0 zero pads a right justified number

with zeros

Page 35: The  awk  Utility

awk Variables• Variables

– No need for declaration

• Implicitly set to 0 AND the Empty String

– Variable type is a combination of a floating-point and string

– Variable is converted as needed, based on its use

title = "Number of students"

no = 100

weight = 13.4

Page 36: The  awk  Utility

Example 2Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes

64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…

Program /icmp/ {

time = substr($7,6,2)

printf( "%1.1f ms\n", time );

}

Output 49.0 ms

94.0 ms

50.0 ms

41.0 ms

Page 37: The  awk  Utility

awk program executionBEGIN { ….}

{ ….}

specification { …..}

END { …..}

Executes only once beforereading input data

Executes for each input line

Executes at the end after alllines being processed

Executes for each input linethat matches specified /pattern/ or Boolean expression

Page 38: The  awk  Utility

Example #1: Count # lines in file

- Set total to 0 before processing any lines- For every row in the file, execute {total = total + 1}- Print total after all lines processed.

$ cat awkprogBEGIN {total = 0}{total = total + 1} END {print total " lines"} $ cat testfileHello ThereGoodbye!$$ awk –f awkprog testfile2 lines$

Page 39: The  awk  Utility

Ex #2: Count lines containing a pattern$ cat SimpsonsMarge 34Homer 32Lisa 10Bart 11Maggie 01$ cat countthem

BEGIN {totalMa = 0; totalar = 0}/Ma/ { totalMa++ }/ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"}

$

{totalpattern++} only executes if the line in filename has pattern appearing in the line.

$ awk -f countthem Simpsons2 Ma's2 ar's$

Page 40: The  awk  Utility

Example #3: Add line numbers

$ cat numawk

BEGIN { print "Line numbers by awk" }

{ print NR ":", $0 }

END { print "Done processing " FILENAME }

$ cat testfile

Hello There

Goodbye!

$$ awk –f numawk testfile

Line numbers by awk

1: Hello There

2: Goodbye!

Done processing testfile

$

Page 41: The  awk  Utility

More Built-In awk Variables• Two types: Informative and Configuration

• Configuration

FS = Input field separator

OFS = Output field separator

(default for both is space " ")

RS = Input record seperator

ORS = Output record seperator

(default for both is newline "\n")

Page 42: The  awk  Utility

Example #1: Reverse 2 columns

$ cat switchBEGIN {FS="\t"}{print $2 "\t" $1} $ awk -f switch Simpsons34 Marge32 Homer10 Lisa11 Bart01 Maggie$

• Alternatively you could do the following: $ awk -F\t '{print $2 "\t" $1}' Simpsons

NOTE: Columns separated by tabs

Page 43: The  awk  Utility

Example #2: Sum a column$ cat awksum2

BEGIN { FS="\t"

sum = 0 }

{sum = sum + $2}

END { print "Done"

print "Total sum is " sum }

$

$ awk -f awksum2 Simpsons

Done

Total sum is 88

$

Page 44: The  awk  Utility

Example #3: Comma delimited file$ cat names

Bill Jones,3333,M

Pam Smith,5555,F

Sue Smith,4444,F

$$ awk -F, '{print $2}' names

3333

5555

4444$

Page 45: The  awk  Utility

Longer awk program$ cat awkprogBEGIN { print "Processing..." }

# print number of fields in first lineNR == 1 { print $0, NF, "fields"}

/^Unix/ { print "Line starts with Unix: ", $0 }/Unix$/ { print "Line ends with Unix: " $0 }

# finishing it upEND {print NR " lines checked"}$

Page 46: The  awk  Utility

awk program execution$ cat datfileFirst LineUnix is great!What else is better?This is UnixYes it is UnixGoodbye!$$ awk -f awkprog datfileProcessing...First Line 2 fieldsLine starts with Unix: Unix is great!Line ends with Unix: This is UnixLine ends with Unix: Yes it is Unix6 lines checked$

Page 47: The  awk  Utility

awk programming language syntax

if ( found == true ) # if (expr)print “Found”; # {action1}

else # elseprint “Not found”; # {action2}

while ( i <= 100) # while (cond){ i = i + 1; # { actions... print i } # }

Page 48: The  awk  Utility

awk programming language syntax

for (i=1; i < 10; i++ ) # for (set; test; incr){ # {

sqr = i * i; # actions print i " squared is " sqr

} # }

do # do{ i = i + 1; # { actions ... print i } # }

while ( i < 100); # while (cond);

Page 49: The  awk  Utility

awk – longer example

• Write an awk program that prints out content of a directory in the following format: BYTES FILE

24576 copyfile 736 copyfile.c 740 copyfile.c~ 24576 dirlist 989 dirlist.c 977 dirlist.c% 24576 envadv 185 envadv.c <dir> tmp 740 x.c

Total: 73684 bytes in 9 regular files

Page 50: The  awk  Utility

awk example - code$ cat awkprog

BEGIN {print " BYTES \t FILE";

sum=0; filenum=0

}

# test for lines starting with -

/^-/ { sum += $5

++filenum

printf ("%10d \t%s\n", $5, $9) }

# test for directories - line starts with d

/^d/ { print " <dir> \t", $9 }

# conclusion

END { print "\n Total: " sum " bytes in"

print " " filenum " regular files"

}

$

Page 51: The  awk  Utility

awk example - output$ ls -l

total 84

drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2

-rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums

-rw------- 1 small000 faculty 2 Jun 3 21:08 tab

-rw------- 1 small000 faculty 187 Jun 8 11:15 tbook

$$ ls -l | awk –f awkprog BYTES FILE <dir> sub2 224 sumnums 2 tab 187 tbook

Total: 413 bytes in 3 regular files$

Page 52: The  awk  Utility

awk Handout

• Review awk examples on handout