Upload
foy
View
104
Download
4
Embed Size (px)
DESCRIPTION
CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk - PowerPoint PPT Presentation
Citation preview
The awk Utility
CS465 - Unix
Background
• awk was developed by
– Aho, Weinberger, and Kernighan (of K & R)
– Was further extended at Bell Labs
• Handles simple data-reformatting jobs easily with just a few lines of code.
• Versions
– awk - original version
– nawk - new awk - improved awk
– gawk - gnu awk - improved nawk
How awk works
• awk commands include patterns and actions
– Scans the input line by line, searching for lines that match a certain pattern (or regular expression)
– Performs a selected action on the matching lines
• awk can be used:
– at the command line for simple operations
– in programs or scripts for larger applications
Running awk
• From the Command Line:
$ awk '/pattern/{action}' file
• OR From an awk script file:
$ cat awkscript# This is a comment/pattern/ {action}$ awk –f awkscript file
awk’s Format using Input from a File
$ awk /pattern/ filename– awk will act like grep
$ awk '{action}' filename– awk will apply the action to every line in the file
$ awk '/pattern/ {action}' filename – awk will apply the action to every line in the file
that matches the pattern
Example 1Input $ cat pingfile
PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$
awk awk '/icmp/' pingfile
Output 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms
Example 1Input $ cat pingfile
PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$
awk awk '{print $1}' pingfile
Output PING 64 646464----dt033n32.san.rr.com PING Statistics----128round-trip
Example 1Input $ cat pingfile
PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…----dt033n32.san.rr.com PING Statistics----128 packets transmitted, 127 packets received, 0% packet lossround-trip (ms) min/avg/max = 37/73/495 ms$
awk awk '/icmp/ {print $5}' pingfile
Output icmp_seq=0icmp_seq=1icmp_seq=2icmp_seq=3
Records and Fields
• awk divides the input into records and fields– Each line is a record (by default)
field-1 field-2 field-3 | | | v v v
record 1 -> George Jones Adminrecord 2 -> Anthony Smith Accounting
– Each record is split into fields, delimited by a special character (whitespace by default)• Can change delimeter with –F or FS
awk field variables
• awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script).
– $1 is the first field, $2 is the second…
– $0 is a special field which is the entire line
– NF is always set to the number of fields in the current line (no dollar sign to access)
Example #1$ cat studentsBill White 7777771 1980/01/01 ScienceJill Blue 1111117 1978/03/20 ArtsBen Teal 7171717 1985/02/26 CompSciSue Beige 1717171 1963/09/12 Science$$ awk '/Science/{print $1, $2}' studentsBill WhiteSue Beige$
Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated):$ awk '/Science/{print $1 $2}' studentsBillWhiteSueBeige
Example #2
- No pattern given, so matches ALL lines- Text strings to print are placed in double quotes
$ cat phonelist
Joe Smith 774-0888
Mary Jones 772-2345
Hank Knight 494-8888
$$ awk '{print "Name: ", $1, $2, \
" Telephone:", $3}' phonelist
Name: Joe Smith Telephone: 774-0888
Name: Mary Jones Telephone: 772-2345
Name: Hank Knight Telephone: 494-8888
$
Example #3
$ grep small /etc/passwd
small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh
$$ awk -F: '/small000/{print $5}' /etc/passwd
Faculty - Pam Smallwood
$
Given a username, display the person’s real name:
awk using Input from Commands
• You can run awk in a pipeline, using input from another command:
$ command | awk '/pattern/ {action}'
– Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern
Piped awk Input Example
$ w | awk '/ksh/{print $1}'pugli766gibbo201nelso828$
$ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00,
0.00, 0.01User tty login@ idle JCPU PCPU whatpugli766 pts/8 Tue10pm 3days -kshlin318 pts/17 10:58am 1:45 vi choosesortsmall000 pts/18 12:43pm wmcdev712 pts/10 11:52am 14 1 vi adddatagibbo201 pts/12 12:15pm 18 -kshnelso828 pts/16 7:17pm 17:43 -ksh$
Relational Operators• awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value
– If the outcome of the comparison is true then the the action is performed
• Examples:
– To print every record in the log.txt file in which the second field is larger than 10
$ awk '$2 > 10' log.txt
– To print every record in the log.txt file which does NOT contain ‘Win32’
$ awk '!/Win32/' log.txt
Relational Operator Example
$ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)nelso828 pts/16 Jun 5 19:17 (65.100.138.177)$$ who | awk '$4 < 6 {print $1, $3, $4, $5}'pugli766 Jun 3 22:24 nelso828 Jun 5 19:17$
Piping awk output$ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)nelso828 pts/16 Jun 5 19:17 (65.100.138.177)$$ who | awk '$4 == 6 {print $1}' | sort
gibbo201lin318mcdev712small000$
awk Programming
• awk programming is done by building a list– The list is a list of rules– Each rule is applied sequentially to each line
(record)
• Example:
/pattern1/ { action1 }
/pattern2/ { action2 }
/pattern3/ { action3 }
awk - pattern matching
• Before processing, lines can be matched with a pattern.
/pattern/ { action } execute if line matches pattern
The pattern is a regular expression.
• Examples:
/^$/ { print "This line is blank" }
/num/ { print "Line includes num" }
/[0-9]+$/ { print "Integer at end:", $0 }
/[A-z]+/ { print "String:", $0 }
/^[A-Z]/ { print "Starts w/uppercase letter" }
awk program from a file
• The awk commands (program) can be placed into a file
• The –f (lowercase f) indicates that the commands come from a file whose name follows the –f
$ awk –f awkfile datafile
The contents of the file called awkfile will be used as the commands for awk
Example 1$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog/5?5/ {print $1, $2}/3*4/ {print $5}$
$ awk –f awkprog studentsArtsBill TealSue Beige$
**NOTE: All patterns applied to each line before moving to next line
Example 2$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog/Science/ {print "Science stu:", $1, $2}/CompSci/ {print "Computing stu:", $1, $2}$
$ awk –f awkprog studentsScience stu: Bill WhiteComputing stu: Bill TealScience stu: Sue Beige$
More about Patterns• Patterns can be:
– Empty: will match everything
– Regular expressions:
/reg-expression/
– Boolean Expressions:
$2=="foo" && $7=="bar"
– Ranges:
/jones/,/smith/
Example - Boolean Expressions$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$ cat awkprog$3 <= 444444 {print "Not counted"}$3 > 444444 {print $2 ",", $1}$
$ awk –f awkprog studentsNot countedNot countedTeal, BillBeige, Sue$
Example - Ranges$ cat studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSciSue Beige 555777 1963/09/12 Science$
$ awk '/333333/,/555555/' studentsBill White 333333 1980/01/01 ScienceJill Blue 333444 1978/03/20 ArtsBill Teal 555555 1985/02/26 CompSci$
More Built-In awk Variables
• Two types: Informative and Configuration
• Informative:
NR = Current Record Number (start at 1)
– Counts ALL records, not just those that match
NF = Number of Fields in the Current Record
FILENAME = Current Input Data File
– Undefined in the BEGIN block
Example using NF$ cat names
Pam Sue Laurie
Bob Joe Bill Dave
Joan Jill
$$ awk '{print NF}' names
3
4
2
0
$
Example using a boolen, NF, and NR
$ cat names
Pam Sue Laurie
Bob Joe Bill Dave
Joan Jill
$$ awk 'NF > 2 {print NR ":", NF, "fields"}' names
1: 3 fields
2: 4 fields
$
Built-in awk functions
log(expr) natural logarithm
index(s1,s2) position of string s2 in string s1
length(s) string length
substr(s,m,n) n-char substring of s starting at m
tolower(s) converts string to lowercase
printf() print formatted - like C printf
Example 2
Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…
Program /PING/ { print tolower($1) }
/icmp/ {
time = substr($7,6,2)
print time
}
Output Ping
49
94
50
41
…
print & printf
• Use print in an awk statement to output specific field(s)
• printf is more versatile
– works like printf in the C language
– May contain a format specifier and a modifier
Format Specification• A format specification consists of a percent symbol, a
modifier, width and precision values, and a conversion character
• To display the third field as a floating point number with two decimal places:
awk '{printf("%.2f\n", $3)}' file
• You can include additional text in the printf statement
'{printf ("3rd value: %.2f\n", $3)}'
Specifiers, Width, Precision, & Modifiers
• Type Specifiers:%c Single character%d integer (decimal)%f Floating point%s String
• Between the % and the specifier you can place the width and precision%6.2f means a floating
point number in a field of width 6 in which there are two decimal places
• Modifiers control details of appearance:- minus sign is the left
justification modifier right justification)
+ plus sign forces the appearance of a sign (+,-) for numeric output
0 zero pads a right justified number
with zeros
awk Variables• Variables
– No need for declaration
• Implicitly set to 0 AND the Empty String
– Variable type is a combination of a floating-point and string
– Variable is converted as needed, based on its use
title = "Number of students"
no = 100
weight = 13.4
Example 2Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes
64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms…
Program /icmp/ {
time = substr($7,6,2)
printf( "%1.1f ms\n", time );
}
Output 49.0 ms
94.0 ms
50.0 ms
41.0 ms
…
awk program executionBEGIN { ….}
{ ….}
specification { …..}
END { …..}
Executes only once beforereading input data
Executes for each input line
Executes at the end after alllines being processed
Executes for each input linethat matches specified /pattern/ or Boolean expression
Example #1: Count # lines in file
- Set total to 0 before processing any lines- For every row in the file, execute {total = total + 1}- Print total after all lines processed.
$ cat awkprogBEGIN {total = 0}{total = total + 1} END {print total " lines"} $ cat testfileHello ThereGoodbye!$$ awk –f awkprog testfile2 lines$
Ex #2: Count lines containing a pattern$ cat SimpsonsMarge 34Homer 32Lisa 10Bart 11Maggie 01$ cat countthem
BEGIN {totalMa = 0; totalar = 0}/Ma/ { totalMa++ }/ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"}
$
{totalpattern++} only executes if the line in filename has pattern appearing in the line.
$ awk -f countthem Simpsons2 Ma's2 ar's$
Example #3: Add line numbers
$ cat numawk
BEGIN { print "Line numbers by awk" }
{ print NR ":", $0 }
END { print "Done processing " FILENAME }
$ cat testfile
Hello There
Goodbye!
$$ awk –f numawk testfile
Line numbers by awk
1: Hello There
2: Goodbye!
Done processing testfile
$
More Built-In awk Variables• Two types: Informative and Configuration
• Configuration
FS = Input field separator
OFS = Output field separator
(default for both is space " ")
RS = Input record seperator
ORS = Output record seperator
(default for both is newline "\n")
Example #1: Reverse 2 columns
$ cat switchBEGIN {FS="\t"}{print $2 "\t" $1} $ awk -f switch Simpsons34 Marge32 Homer10 Lisa11 Bart01 Maggie$
• Alternatively you could do the following: $ awk -F\t '{print $2 "\t" $1}' Simpsons
NOTE: Columns separated by tabs
Example #2: Sum a column$ cat awksum2
BEGIN { FS="\t"
sum = 0 }
{sum = sum + $2}
END { print "Done"
print "Total sum is " sum }
$
$ awk -f awksum2 Simpsons
Done
Total sum is 88
$
Example #3: Comma delimited file$ cat names
Bill Jones,3333,M
Pam Smith,5555,F
Sue Smith,4444,F
$$ awk -F, '{print $2}' names
3333
5555
4444$
Longer awk program$ cat awkprogBEGIN { print "Processing..." }
# print number of fields in first lineNR == 1 { print $0, NF, "fields"}
/^Unix/ { print "Line starts with Unix: ", $0 }/Unix$/ { print "Line ends with Unix: " $0 }
# finishing it upEND {print NR " lines checked"}$
awk program execution$ cat datfileFirst LineUnix is great!What else is better?This is UnixYes it is UnixGoodbye!$$ awk -f awkprog datfileProcessing...First Line 2 fieldsLine starts with Unix: Unix is great!Line ends with Unix: This is UnixLine ends with Unix: Yes it is Unix6 lines checked$
awk programming language syntax
if ( found == true ) # if (expr)print “Found”; # {action1}
else # elseprint “Not found”; # {action2}
while ( i <= 100) # while (cond){ i = i + 1; # { actions... print i } # }
awk programming language syntax
for (i=1; i < 10; i++ ) # for (set; test; incr){ # {
sqr = i * i; # actions print i " squared is " sqr
} # }
do # do{ i = i + 1; # { actions ... print i } # }
while ( i < 100); # while (cond);
awk – longer example
• Write an awk program that prints out content of a directory in the following format: BYTES FILE
24576 copyfile 736 copyfile.c 740 copyfile.c~ 24576 dirlist 989 dirlist.c 977 dirlist.c% 24576 envadv 185 envadv.c <dir> tmp 740 x.c
Total: 73684 bytes in 9 regular files
awk example - code$ cat awkprog
BEGIN {print " BYTES \t FILE";
sum=0; filenum=0
}
# test for lines starting with -
/^-/ { sum += $5
++filenum
printf ("%10d \t%s\n", $5, $9) }
# test for directories - line starts with d
/^d/ { print " <dir> \t", $9 }
# conclusion
END { print "\n Total: " sum " bytes in"
print " " filenum " regular files"
}
$
awk example - output$ ls -l
total 84
drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2
-rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums
-rw------- 1 small000 faculty 2 Jun 3 21:08 tab
-rw------- 1 small000 faculty 187 Jun 8 11:15 tbook
$$ ls -l | awk –f awkprog BYTES FILE <dir> sub2 224 sumnums 2 tab 187 tbook
Total: 413 bytes in 3 regular files$
awk Handout
• Review awk examples on handout