31
1 Unix Talk #2 Unix Talk #2 AWK overview AWK overview Patterns and actions Patterns and actions Records and fields Records and fields Print vs. printf Print vs. printf

1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

Embed Size (px)

Citation preview

Page 1: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

11

Unix Talk #2Unix Talk #2

AWK overviewAWK overviewPatterns and actionsPatterns and actionsRecords and fieldsRecords and fields

Print vs. printfPrint vs. printf

Page 2: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

22

IntroductionIntroduction

Students' grades in a text fileStudents' grades in a text file JohnJohn 22 56 38 70 85 8022 56 38 70 85 80 AlexAlex 90 89 7990 89 79 98 3598 35 How can I calculate John's current average within this fileHow can I calculate John's current average within this file GREP?GREP?

– Search for John with grep? Gives me the line.Search for John with grep? Gives me the line.– Now I can use my calculator to figure it out.Now I can use my calculator to figure it out.– SED?SED?

sed will allow me to print, change, delete, etc.sed will allow me to print, change, delete, etc.

I really want to automatically manipulate the values within this line.I really want to automatically manipulate the values within this line.

This is where awk comes in.This is where awk comes in. (awk me amadeus) (awk me amadeus)

Page 3: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

33

awkawk

The first initials from the last names of each The first initials from the last names of each of the authors, Aho, Weinberg and of the authors, Aho, Weinberg and KernighanKernighan

Which awk are we tawking about?Which awk are we tawking about?– awkawk– nawk – new awk ( on CS machines )nawk – new awk ( on CS machines )– gawk – GNU awk ( bart ) gawk – GNU awk ( bart )

Page 4: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

44

AWK syntaxAWK syntax

awk ‘/pattern/’ fileawk ‘/pattern/’ file awk ‘{action}’ fileawk ‘{action}’ file awk ‘/pattern/ {action;}' fileawk ‘/pattern/ {action;}' file cat file | awk ‘{action}’cat file | awk ‘{action}’

Awk automatically reads in the file for you Awk automatically reads in the file for you line line by line.by line.– No need to open/close file. (like in C or Java)No need to open/close file. (like in C or Java)– pattern section FINDS LINES with that patternpattern section FINDS LINES with that pattern– action section does the actions you defined on the action section does the actions you defined on the

lines it foundlines it found– The original file does not change.The original file does not change.

Page 5: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

55

Simple exampleSimple example

awk ‘{ print }’ fruit_pricesawk ‘{ print }’ fruit_prices

Note: Here the pattern is missing, in this Note: Here the pattern is missing, in this case, the awk command case, the awk command printprint is used to is used to print each line it readprint each line it read

Page 6: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

66

Simple exampleSimple example

awk ‘awk ‘

/\$[0-9]*\.[0-9][0-9]*/ { print}/\$[0-9]*\.[0-9][0-9]*/ { print}

‘ ‘ fruit_pricesfruit_prices

Page 7: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

77

ActionAction

Actions are specified by the programmers not just Actions are specified by the programmers not just print, delete, etc (p/d/s from sed). That is why it is print, delete, etc (p/d/s from sed). That is why it is so awesome!so awesome!

Actions consists of Actions consists of – variable assignments, variable assignments, – arithmetic and logic operators, arithmetic and logic operators, – decision structures, decision structures, – looping structures. looping structures.

For example, print, if, while and forFor example, print, if, while and for awk ‘{print}’ filenameawk ‘{print}’ filename

Page 8: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

88

Execution typesExecution types

format 1: awk ‘script’format 1: awk ‘script’– where INPUT must come from pipe or STDINwhere INPUT must come from pipe or STDIN– command | awk ‘script’command | awk ‘script’

format 2: awk ‘script’ input1 input2 ... inputnformat 2: awk ‘script’ input1 input2 ... inputn– where we supply input FILES as input1, input2, etc.where we supply input FILES as input1, input2, etc.

format 3: awk -f script_file input1...format 3: awk -f script_file input1... (# in "script..." is comment)(# in "script..." is comment)

Page 9: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

99

PatternPattern

TypesTypes– Regular expressionsRegular expressions– BEGINBEGIN

Do all the stuff BEFORE reading any input Do all the stuff BEFORE reading any input – ENDEND

does all this stuff AFTER reading ALL input. does all this stuff AFTER reading ALL input. Pattern is optionalPattern is optional If no pattern is specified, the "action" will occur for EVERY If no pattern is specified, the "action" will occur for EVERY

LINE one @ time.LINE one @ time. awk ‘{Action}’ filenameawk ‘{Action}’ filename awk '{print;}' namesawk '{print;}' names prints all linesprints all lines awk ‘BEGIN {print “The average grades”}’awk ‘BEGIN {print “The average grades”}’

Page 10: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1010

Awk Regular Expression Awk Regular Expression MetacharactersMetacharacters

SupportsSupports– ^, $, ., *, +, ?, [ABC], [^ABC],^, $, ., *, +, ?, [ABC], [^ABC],– [A-Z], A|B, (AB)+, \, &[A-Z], A|B, (AB)+, \, &

Not supportNot support– Backreferencing, \( \)Backreferencing, \( \)– Repetition, \{ \}Repetition, \{ \}

Page 11: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1111

awk ‘awk ‘BEGIN { actions ; }BEGIN { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }END { actions ;}END { actions ;}

‘ ‘ filesfiles

Execution steps:Execution steps:1)1) If a BEGIN pattern is present, executes its actions If a BEGIN pattern is present, executes its actions 2)2) Reads an input line and parses it into fieldsReads an input line and parses it into fields3)3) Compares each of the specified patterns against the input line, Compares each of the specified patterns against the input line,

if find a match, executes the actions. This step is repeated for if find a match, executes the actions. This step is repeated for all patterns.all patterns.

4)4) Repeats steps 2 and 3 while input lines are presentRepeats steps 2 and 3 while input lines are present5)5) After the script reads all the input lines, if the END pattern is After the script reads all the input lines, if the END pattern is

present, executes its actionspresent, executes its actions

Page 12: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1212

Try This!Try This!

Place the following in the file tryawk1.awkPlace the following in the file tryawk1.awkBEGIN { print "Starting to read input";BEGIN { print "Starting to read input"; nLines = 0; }nLines = 0; }/^.*$/ { nLines++; }/^.*$/ { nLines++; }END { print “DONE: Total lines = “ nLines; }END { print “DONE: Total lines = “ nLines; }

– Run the command: Run the command: cat tryawk1.awk | cat tryawk1.awk | awk –f tryawk1.awkawk –f tryawk1.awk

– Counts the # of lines in the inputCounts the # of lines in the input nLines is a variable … note NO declaration, just usenLines is a variable … note NO declaration, just use print command prints a line of text, adds newline to print command prints a line of text, adds newline to

end of the lineend of the line

Page 13: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1313

Records and fieldsRecords and fields awk has RECORDS (lines) and FIELDSawk has RECORDS (lines) and FIELDS $0 represents the entire line of input$0 represents the entire line of input $1 represents the first field$1 represents the first field Print just like echoPrint just like echo

– Print $1 $2 # $1 concat $2Print $1 $2 # $1 concat $2– Print $1, $2 # $1 OFS $2Print $1, $2 # $1 OFS $2

cat fruit_pricescat fruit_prices

awk '{print;}' fruit_prices awk '{print;}' fruit_prices #prints all lines#prints all lines

awk '{print $0;}' fruit_prices awk '{print $0;}' fruit_prices #prints each entire line#prints each entire line

awk '{print $1;}' fruit_prices awk '{print $1;}' fruit_prices #prints first field in each line#prints first field in each line

awk '{print $2;}' fruit_prices awk '{print $2;}' fruit_prices #prints second field in each line#prints second field in each line

Page 14: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1414

ExamplesExamples

cat phones.datacat phones.dataJohn Robinson 234-3456John Robinson 234-3456Yin Pan 123-4567Yin Pan 123-4567

awk ‘{ print $1, $2, $3 }’ phones.dataawk ‘{ print $1, $2, $3 }’ phones.data John Robinson 234-3456John Robinson 234-3456

Yin Pan 123-4567Yin Pan 123-4567awk ‘{ print $2 “, ”, $1, $3 }’ phones.dataawk ‘{ print $2 “, ”, $1, $3 }’ phones.data Robinson, John 234-3456Robinson, John 234-3456 Pan, Yin 123-4567Pan, Yin 123-4567awk ‘/^$/ { print x += 1 }’ phones.dataawk ‘/^$/ { print x += 1 }’ phones.dataawk ‘/Mary/ { print $0 }’ phones.dataawk ‘/Mary/ { print $0 }’ phones.data

Page 15: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1515

Examples (con’t)Examples (con’t)

ls -l | awk ‘ls -l | awk ‘$6 == "Oct" { sum += $5 ; } $6 == "Oct" { sum += $5 ; } END { print sum ; }END { print sum ; }‘‘

ls -l | awk -f block_use.awkls -l | awk -f block_use.awk

cat block_use.awkcat block_use.awk$6 == "Oct" { sum += $5 ; } $6 == "Oct" { sum += $5 ; } END { print sum ; }END { print sum ; }

Page 16: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1616

Taking Pattern-specific ActionsTaking Pattern-specific Actions

#!/bin/sh#!/bin/sh

awk ‘awk ‘

/\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;}/\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;}

/\$0\.[0-9][0-9]*/ { print ;}/\$0\.[0-9][0-9]*/ { print ;}

‘ ‘ fruit_pricesfruit_prices

Page 17: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1717

Intrinsic variablesIntrinsic variables

awk defines RECORDS (lines) and FIELDSawk defines RECORDS (lines) and FIELDS– FS, input field separator (default=space/tab)FS, input field separator (default=space/tab)– OFS, output field separator (default=space)OFS, output field separator (default=space)– ORS, Output record separator (default=newline)ORS, Output record separator (default=newline)– RS, Input record separator (default=newline)RS, Input record separator (default=newline)– NR, number of the current record being processedNR, number of the current record being processed– NF, number of fields within current recordNF, number of fields within current record– FILENAME, awk sets this pattern to the name of the file FILENAME, awk sets this pattern to the name of the file

that it's currently reading. (If you have more than input that it's currently reading. (If you have more than input file, awk resets this pattern as it reads each file in turn.file, awk resets this pattern as it reads each file in turn.

Page 18: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1818

How does awk workHow does awk work

awk ‘{print $1, $3}’ namesawk ‘{print $1, $3}’ names– Put a line of input to $0 based on RSPut a line of input to $0 based on RS– The line is broken into fields based on FS and store The line is broken into fields based on FS and store

them in a numbered variable, starting with $1them in a numbered variable, starting with $1– Prints the fields with print or others based on OFS to Prints the fields with print or others based on OFS to

separate fieldsseparate fields– After awk displays it output, it goes to next line and After awk displays it output, it goes to next line and

repeat. The output lines are separated by ORS.repeat. The output lines are separated by ORS.

Page 19: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

1919

Changing the Input Field SeparatorChanging the Input Field Separator

Manually resetting FS in a BEGIN patternManually resetting FS in a BEGIN pattern– Forces you to Forces you to hard codehard code the value of the field separator the value of the field separator– BEGIN{FS=“:” ; }BEGIN{FS=“:” ; }– Example: Example:

$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd

Specifying the –F option to awkSpecifying the –F option to awk– awk –F: ‘ { … } ’awk –F: ‘ { … } ’– Enables using a shell variable to specify the field separator Enables using a shell variable to specify the field separator

dynamicallydynamically– Example:Example:

sep=‘:’sep=‘:’ $ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd$ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd

Page 20: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2020

ExampleExample

FirstName;LastName;Address;City;State;Zip;PhoneFirstName;LastName;Address;City;State;Zip;Phone SSN:DOB:NumberOfDependentsSSN:DOB:NumberOfDependents HospitilizationCOde,DentalCode,LifeCOdeHospitilizationCOde,DentalCode,LifeCOde

Convert this file format to:Convert this file format to: SSN,LastName,FirstName,Address,….SSN,LastName,FirstName,Address,….

Page 21: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2121

awk ‘BEGIN{OFS=“,”; FS=“;”}awk ‘BEGIN{OFS=“,”; FS=“;”}

{NR%3==1 {FS=“;”; #prepare{NR%3==1 {FS=“;”; #prepareF=$1; L=$2; A=$3;…..}F=$1; L=$2; A=$3;…..}

NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}

NR%3==0{FS=“,”;…;print F L A…}NR%3==0{FS=“,”;…;print F L A…}

}’ filename}’ filename

Page 22: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2222

Print vs. Printf.2Print vs. Printf.2

printfprintf– 11stst argument is a string … the ‘format’ argument is a string … the ‘format’– Prints each character of the formatPrints each character of the format

Upon reaching a %, the next few characters are a format specifierUpon reaching a %, the next few characters are a format specifier The next argument is printed according to the specifierThe next argument is printed according to the specifier

– Does not append a newlineDoes not append a newline– More control over appearance of outputMore control over appearance of output– ConsiderConsider

awk 'BEGIN { printf "%5.2f\n", 2/3; }' awk 'BEGIN { printf "%5.2f\n", 2/3; }' Prints Prints 0.67 (here, the 0.67 (here, the represents a space) represents a space) %5.2f means print a fractional number (the ‘f’) in a field 5 %5.2f means print a fractional number (the ‘f’) in a field 5

characters wide, with 2 digits to the right of the decimal point.characters wide, with 2 digits to the right of the decimal point.

Page 23: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2323

Why PrintfWhy Printf printf - for formatting output of your printf - for formatting output of your

“print”“print” We have function print, why printfWe have function print, why printf

– Printf allows us to FORMAT stuff.Printf allows us to FORMAT stuff.– can FORCE printing of stringcan FORCE printing of string– DecimalsDecimals– whole numberswhole numbers– how many digits fall on either side of how many digits fall on either side of

decimal ptdecimal pt– scientific notationscientific notation– make things line up nicelymake things line up nicely

Page 24: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2424

printfprintf

printf (format, what to print)printf (format, what to print) printf ( "%s", x)printf ( "%s", x)

– %s is a PLACEHOLDER for some OUTPUT.%s is a PLACEHOLDER for some OUTPUT.– s is a specific type of output (string)s is a specific type of output (string)– ONE item (%s), must have ONE thing to print in the "what to print“ONE item (%s), must have ONE thing to print in the "what to print“– format inside of quotes, followed by comma, followed by variables format inside of quotes, followed by comma, followed by variables

outside the quotes to print.outside the quotes to print.

printf ( " s = %s ", x )printf ( " s = %s ", x )– "s=" is a LITERAL string"s=" is a LITERAL string

Page 25: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2525

Printf formatPrintf format

s = A character strings = A character string f = A floating point numberf = A floating point number d or i= the integer part of a decimal numberd or i= the integer part of a decimal number g or e = scientific notation of a floating point g or e = scientific notation of a floating point c = An ASCII characterc = An ASCII character if x=65 and I use this print statementif x=65 and I use this print statement printf ( " s = %c ", x )printf ( " s = %c ", x ) output is "s = A“output is "s = A“

awk 'BEGIN{x=65; printf("char: %c\n", x)}'awk 'BEGIN{x=65; printf("char: %c\n", x)}'

Page 26: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2626

PrintfPrintf

More control:More control:– %wd%wd

Print an integer out in a field of width wPrint an integer out in a field of width w If the number is smaller than w characters, print If the number is smaller than w characters, print

leading spacesleading spaces Try Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/nullawk 'BEGIN { printf "%10d\n", 10; }' /dev/null

– Try to add a ‘-’ immediately after the %Try to add a ‘-’ immediately after the % Left justifies the value in the fieldLeft justifies the value in the field

Page 27: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2727

PrintfPrintf

%ws%ws– Print a string out in a field of width wPrint a string out in a field of width w– Supply leading spaces as necessarySupply leading spaces as necessary

Place a ‘-’ immediately after the % to get left Place a ‘-’ immediately after the % to get left justificationjustification

Page 28: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2828

PrintfPrintf

%w.df%w.df– Prints the value out in a field of width wPrints the value out in a field of width w– Places the decimal point d places from the right Places the decimal point d places from the right

endend– Place a ‘-’ immediately after the % to get left Place a ‘-’ immediately after the % to get left

justificationjustification

Page 29: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2929

Printf examplesPrintf examples Apple 10 20 25 Apple 10 20 25 <---10----><-5-><-5-><-5-><---10----><-5-><-5-><-5->

awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file

awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file

minus sign designates that this field will be LEFT JUSTIFIEDminus sign designates that this field will be LEFT JUSTIFIED

awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file

awk ‘{printf (“|%-15s|\n”, $1)}’awk ‘{printf (“|%-15s|\n”, $1)}’

Page 30: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

3030

Printf examplesPrintf examples Let’s put an average in there...Let’s put an average in there...

printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average ) Will provide RAW number ( as many decimals as the Will provide RAW number ( as many decimals as the

calculation provides with 6 char’s to RIGHT of decimal)calculation provides with 6 char’s to RIGHT of decimal)

printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )

%.2f says use TWO char's to RIGHT of decimal%.2f says use TWO char's to RIGHT of decimal

printf doesn't provide the newline automatically....printf doesn't provide the newline automatically.... printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )

Page 31: 1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

3131

The OFMT variableThe OFMT variable(stands for Output Formatting for (stands for Output Formatting for

numbers)numbers) A special awk variableA special awk variable Control the printing of numbers when using Control the printing of numbers when using

print functionprint function awk ‘BEGIN{print 1.243434534;}’awk ‘BEGIN{print 1.243434534;}’ awk ‘BEGIN{OFMT=“%.2f”; print awk ‘BEGIN{OFMT=“%.2f”; print

1.23344455;}’1.23344455;}’