34
File spacing. 1. Double-space a file. sed G This sed one-liner uses the ‘G’ command. If you grabbed the cheat-sheet you’d see that the command ‘G’ appends to the pattern space. It appends a newline followed by the contents of hold buffer. In this example the hold buffer is empty all the time (only three commands ‘h’, ‘H’ and ‘x’ modify hold buffer), so we end up simply appending a newline to the pattern space. Once all the commands have been processed (in this case just the ‘G’ command), sed puts the contents of pattern space to output stream followed by a newline. There we have it, two newlines — one added by the ‘G’ command and the other by output stream. File has been double spaced. 2. Double-space a file which already has blank lines in it. Do it so that the output contains no more than one blank line between two lines of text. sed '/^$/d;G' Sed allows to restrict commands only to certain lines. This one-liner operates only on lines that match a regular expression /^$/. Which are those? Those are the empty lines. Note that before doing the regular expression match sed pushes the input line to pattern space. When doing it, sed strips the trailing newline character. The empty lines contain just the newline character, so after they have been put into pattern space, this only character has been removed and pattern space stays empty. Regular expression / ^$/ matches an empty pattern space and sed applies ‘d’ command on it, which deletes the current pattern space, reads in the next line, puts it into the pattern space and aborts the current command, and starts the execution from the beginning. The lines which do not match emptiness get a newline character appended by the ‘G’ command, just like in one-liner #1. In general sed allows to restrict operations to certain lines (5th, 27th, etc.), to a range of lines (lines 10- 20), to lines matching a pattern (lines containing the word “catonmat”), and to lines between two patterns (lines between “catonmat” and “coders”). 3. Triple-space a file. sed 'G;G' Several sed commands can be combined by separating them by ‘;’. Such commands get executed one after another. This one-liner does twice what the one-liner #1 does — appends two newlines (via two ‘G’ commands) to output. 4. Undo double-spacing. sed 'n;d' This one-liner assumes that even-numbered lines are always blank. It uses two new commands - ‘n’ and ‘d’. The ‘n’ command prints out the current pattern space (unless the ‘-n’ flag has been specified), empties the current pattern space and reads in the next line of input. We assumed that even-numbered lines are always blank. This means that ‘n’ prints the first, third, fifth, …, etc. line and reads in the following line. The line following the printed line is always an empty line. Now the ‘d’ command gets executed. The ‘d’ command deletes the current pattern space, reads in the next line, puts the new line into the pattern space and aborts the current command, and starts the execution from the first sed command. Now the the ‘n’ commands gets executed again, then ‘d’, then ‘n’, etc.

Unix Awk Sed

Embed Size (px)

Citation preview

Page 1: Unix Awk Sed

File spacing.1. Double-space a file.sed G

This sed one-liner uses the ‘G’ command. If you grabbed the cheat-sheet you’d see that the command ‘G’ appends to the pattern space. It appends a newline followed by the contents of hold buffer. In this example the hold buffer is empty all the time (only three commands ‘h’, ‘H’ and ‘x’ modify hold buffer), so we end up simply appending a newline to the pattern space. Once all the commands have been processed (in this case just the ‘G’ command), sed puts the contents of pattern space to output stream followed by a newline. There we have it, two newlines — one added by the ‘G’ command and the other by output stream. File has been double spaced.

2. Double-space a file which already has blank lines in it. Do it so that the output contains no more than one blank line between two lines of text.sed '/^$/d;G'

Sed allows to restrict commands only to certain lines. This one-liner operates only on lines that match a regular expression /^$/. Which are those? Those are the empty lines. Note that before doing the regular expression match sed pushes the input line to pattern space. When doing it, sed strips the trailing newline character. The empty lines contain just the newline character, so after they have been put into pattern space, this only character has been removed and pattern space stays empty. Regular expression /^$/ matches an empty pattern space and sed applies ‘d’ command on it, which deletes the current pattern space, reads in the next line, puts it into the pattern space and aborts the current command, and starts the execution from the beginning. The lines which do not match emptiness get a newline character appended by the ‘G’ command, just like in one-liner #1.

In general sed allows to restrict operations to certain lines (5th, 27th, etc.), to a range of lines (lines 10-20), to lines matching a pattern (lines containing the word “catonmat”), and to lines between two patterns (lines between “catonmat” and “coders”).

3. Triple-space a file.sed 'G;G'

Several sed commands can be combined by separating them by ‘;’. Such commands get executed one after another. This one-liner does twice what the one-liner #1 does — appends two newlines (via two ‘G’ commands) to output.

4. Undo double-spacing.sed 'n;d'

This one-liner assumes that even-numbered lines are always blank. It uses two new commands - ‘n’ and ‘d’. The ‘n’ command prints out the current pattern space (unless the ‘-n’ flag has been specified), empties the current pattern space and reads in the next line of input. We assumed that even-numbered lines are always blank. This means that ‘n’ prints the first, third, fifth, …, etc. line and reads in the following line. The line following the printed line is always an empty line. Now the ‘d’ command gets executed. The ‘d’ command deletes the current pattern space, reads in the next line, puts the new line into the pattern space and aborts the current command, and starts the execution from the first sed command. Now the the ‘n’ commands gets executed again, then ‘d’, then ‘n’, etc.

Page 2: Unix Awk Sed

To make it shorter - ‘n’ prints out the current line, and ‘d’ deletes the empty line, thus undoing the double-spacing.

5. Insert a blank line above every line that matches “regex”.sed '/regex/{x;p;x;}'

This one liner uses the restriction operation together with two new commands - ‘x’ and ‘p’. The ‘x’ command exchanges the hold buffer with the pattern buffer. The ‘p’ command duplicates input — prints out the entire pattern space. This one-liner works the following way: a line is read in pattern space, then the ‘x’ command exchanges it with the empty hold buffer. Next the ‘p’ command prints out emptiness followed by a newline, so we get an empty line printed before the actual line. Then ‘x’ exchanges the hold buffer (which now contains the line) with pattern space again. There are no more commands so sed prints out the pattern space. We have printed a newline followed by the line, or saying it in different words, inserted a blank line above every line.

Also notice the { … }. This is command grouping. It says, execute all the commands in “…” on the line that matches the restriction operation.

6. Insert a blank line below every line that matches “regex”.sed '/regex/G'

This one liner combines restriction operation with the ‘G’ command, described in one-liner #1. For every line that matches /regex/, sed appends a newline to pattern space. All the other lines that do not match /regex/ just get printed out without modification.

7. Insert a blank line above and below every line that matches “regex”.sed '/regex/{x;p;x;G;}'

This one-liner combines one-liners #5, #6 and #1. Lines matching /regex/ get a newline appended before them and printed (x;p;x from #5). Then they are followed by another newline from the ‘G’ command (one-liner #6 or #1).

2. Numbering.8. Number each line of a file (named filename). Left align the number.sed = filename | sed 'N;s/\n/\t/'

One-liners get trickier and trickier. This one-liner is actually two separate one-liners. The first sed one-liner uses a new command called ‘=’. This command operates directly on the output stream and prints the current line number. There is no way to capture the current line number to pattern space. That’s why the second one-liner gets called. The output of first one-liner gets piped to the input of second. The second one-liner uses another new command ‘N’. The ‘N’ command appends a newline and the next line to current pattern space. Then the famous ’s///’ command gets executed which replaces the newline character just appended with a tab. After these operations the line gets printed out.

To make it clear what ‘=’ does, take a look at this example file:line oneline twoline three

Running the first one-liner ’sed = filename’, produces output:

Page 3: Unix Awk Sed

1line one2line two3line three

Now, the ‘N’ command of the second one-liner joins these lines with a newline character:1\nline one2\nline two3\nline three

The ’s/\n/\t/’ replaces the newline chars with tabs, so we end up with:1 line one2 line two3 line three

The example is a little inaccurate as line joining with a newline char happens line after line, not on all lines at once.

9. Number each line of a file (named filename). Right align the number.sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /'

This one-liner is also actually two one-liners. The first one liner numbers the lines, just like #8. The second one-liner uses the ‘N’ command to join the line containing the line number with the actual line. Then it uses two substitute commands to right align the number. The first ’s’ command ’s/^/ /’ appends 5 white-spaces to the beginning of line. The second ’s’ command ’s/ *\(.\{6,\}\)\n/\1 /’ captures at least six symbols up to a newline and replaces the capture and newline with the back-reference ‘\1′ and two more whitespace to separate line number from the contents of line.

I think it’s hard to understand the last part of this sed expression by just reading. Let’s look at an example. For clearness I replaced the ‘\n’ newline char with a ‘@’ and whitespace with ‘-’.$ echo "-----12@contents" | sed 's/-*\(.\{6,\}\)@/\1--/'----12--contents

The regular expression ‘-*\(.\{6,\}\)@’ (or just ‘-*(.{6,})@’) tells sed to match some ‘-’ characters followed by at least 6 other characters, followed by a ‘@’ symbol. Sed captures them (remembers them) in \1.

In this example sed matches the first ‘-’ (the ‘-*’ part of regex), then the following six characters “----12″ and ‘@’ (the ‘*\(.\{6,\}\)@’ part of regex). Now it replaces the matched part of the string “-----12@” with the contents of captured group which is “----12″ plus two extra whitespace. The final result is that “-----12@” gets replaced with “----12--”.

10. Number each non-empty line of a file (called filename).sed '/./=' filename | sed '/./N; s/\n/ /'

This one-liner is again two one-liners. The output of the first one-liner gets piped to the input of second. The first one-liner filters out lines with at least one character in them. The regular expression ‘/./’ says: match lines with at least one char in them. When the empty lines (containing just a newline) get sent to the pattern space, the newline character gets removed, so the empty lines do not get

Page 4: Unix Awk Sed

matched. The second one-liner does the same one-liner #8 did, except that only numbered lines get joined and printed out. Command ‘/./N’ makes sure that empty lines are left as-is.

11. Count the number of lines in a file (emulates “wc -l”).sed -n '$='

This one-liner uses a command line switch “-n” to modify sed’s behavior. The “-n” switch tells sed not to send the line to output after it has been processed in the pattern space. The only way to make sed output anything with the “-n” switch being on is to use a command that modifies the output stream directly (these commands are ‘=’, ‘a’, ‘c’, ‘i’, ‘I’, ‘p’, ‘P’, ‘r’ and ‘w’). In this one-liner what seems to be the command “$=” is actually a restriction pattern “$” together with the “=” command. The restriction pattern “$” applies the “=” command to the last line only. The “=” command outputs the current line number to standard output. As it is applied to the last line only, this one-liner outputs the number of lines in the file.

3. Text Conversion and Substitution.12. Convert DOS/Windows newlines (CRLF) to Unix newlines (LF).sed 's/.$//'

This one-one liner assumes that all lines end with CR+LF (carriage return + line feed) and we are in a Unix environment. Once the line gets read into pattern space, the newline gets thrown away, so we are left with lines ending in CR. The 's/.$//' command erases the last character by matching the last character of the line (regex '.$') and substituting it with nothing. Now when the pattern space gets output, it gets appended the newline and we are left with lines ending with LF.

The assumption about being in a Unix environment is necessary because the newline that gets appended when the pattern space gets copied to output stream is the newline of that environment.

13. Another way to convert DOS/Windows newlines (CRLF) to Unix newlines (LF).sed 's/^M$//'

This one-liner again assumes that we are in a Unix environment. It erases the carriage return control character ^M. You can usually enter the ^M control char literally by first pressing Ctrl-V (it’s control key + v key) and then Ctrl-M.

14. Yet another way to convert DOS/Windows newlines to Unix newlines.sed 's/\x0D$//'

This one-liner assumes that we are on a Unix machine. It also assumes that we use a version of sed that supports hex escape codes, such as GNU sed. The hex value for CR is 0×0D (13 decimal). This one-liner erases this character.

15-17. Convert Unix newlines (LF) to DOS/Windows newlines (CRLF).sed "s/$/`echo -e \\\r`/"

This one-liner also assumes that we are in a Unix environment. It calls shell for help. The 'echo -e \\\r' command inserts a literal carriage return character in the sed expression. The sed “s/$/char/” command appends a character to the end of current pattern space.

18. Another way to convert Unix newlines (LF) to DOS/Windows newlines (CRLF).

Page 5: Unix Awk Sed

sed 's/$/\r/'

This one-liner assumes that we use GNU sed. GNU sed is smarter than other seds and can take escape characters in the replace part of s/// command.

19. Convert Unix newlines (LF) to DOS/Windows newlines (CRLF) from DOS/Windows.sed "s/$//"

This one-liner works from DOS/Windows. It’s basically a no-op one-liner. It replaces nothing with nothing and then sends out the line to output stream where it gets CRLF appended.

20. Another way to convert Unix newlines (LF) to DOS/Windows newlines (CRLF) from DOS/Windows.sed -n p

This is also a no-op one-liner, just like #19. The shortest one-liner which does the same is:sed ''

21. Convert DOS/Windows newlines (LF) to Unix format (CRLF) from DOS/Windows.sed "s/\r//"

Eric says that this one-liner works only with UnxUtils sed v4.0.7 or higher. I don’t know anything about this version of sed, so let’s just trust him. This one-liner strips carriage return (CR) chars from lines. Then when they get output, CRLF gets appended by magic.

Eric mentions that the only way to convert LF to CRLF on a DOS machine is to use tr:tr -d \r <infile >outfile

22. Delete leading whitespace (tabs and spaces) from each line.sed 's/^[ \t]*//'

Pretty simple, it matches zero-or-more spaces and tabs at the beginning of the line and replaces them with nothing, i.e. erases them.

23. Delete trailing whitespace (tabs and spaces) from each line.sed 's/[ \t]*$//'

This one-liner is very similar to #22. It does the same substitution, just matching zero-or-more spaces and tabs at the end of the line, and then erases them.

24. Delete both leading and trailing whitespace from each line.sed 's/^[ \t]*//;s/[ \t]*$//'

This one liner combines #22 and #23. First it does what #22 does, erase the leading whitespace, and then it does the same as #23, erase trailing whitespace.

25. Insert five blank spaces at the beginning of each line.sed 's/^/ /'

It does it by matching the null-string at the beginning of line (^) and replaces it with five spaces “ ”.

Page 6: Unix Awk Sed

26. Align lines right on a 79-column width.sed -e :a -e 's/^.\{1,78\}$/ &/;ta'

This one-liner uses a new command line option and two new commands. The new command line option is ‘-e’. It allows to write a sed program in several parts. For example, a sed program with two substitution rules could be written as “sed -e ’s/one/two/’ -e ’s/three/four’” instead of “sed ’s/one/two/;s/three/four’”. It makes it more readable. In this one-liner the first “-e” creates a label called “a”. The ‘:’ command followed by a name crates a named label. The second “-e” uses a new command “t”. The “t” command branches to a named label if the last substitute command modified pattern space. This branching technique can be used to create loops in sed. In this one-liner the substitute command left-pads the string (right aligns it) a single whitespace at a time, until the total length of the string exceeds 78 chars. The “&” in substitution command means the matched string.

Translating it in modern language, it would look like this:while (str.length() <= 78) { str = " " + str}

27. Center all text in the middle of 79-column width.sed -e :a -e 's/^.\{1,77\}$/ & /;ta'

This one-liner is very similar to #26, but instead of left padding the line one whitespace character at a time it pads it on both sides until it has reached length of at least 77 chars. Then another two whitespaces get added at the last iteration and it has grown to 79 chars.

Another way to do the same issed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/'

This one-liner left pads the string one whitespace char at a time until it has reached length of 78 characters. Then the additional “s/\( *\)\1/\1/” command gets executed which divides the leading whitespace “in half”. This effectively centers the string. Unlike the previous one-liner this one-liner does not add trailing whitespace. It just adds enough leading whitespace to center the string.

28. Substitute (find and replace) the first occurrence of “foo” with “bar” on each line.sed 's/foo/bar/'

This is the simplest sed one-liner possible. It uses the substitute command and applies it once on each line. It substitutes string “foo” with “bar”.

29. Substitute (find and replace) the fourth occurrence of “foo” with “bar” on each line.sed 's/foo/bar/4'

This one-liner uses a flag for the substitute command. With no flags the first occurrence of pattern is changed. With a numeric flag like “/1″, “/2″, etc. only that occurrence is substituted. This one-liner uses numeric flag “/4″ which makes it change fourth occurrence on each line.

30. Substitute (find and replace) all occurrence of “foo” with “bar” on each line.sed 's/foo/bar/g'

This one-liner uses another flag. The “/g” flag which stands for global. With global flag set, substitute

Page 7: Unix Awk Sed

command does as many substitutions as possible, i.e., all.

31. Substitute (find and replace) the first occurrence of a repeated occurrence of “foo” with “bar”.sed 's/\(.*\)foo\(.*foo\)/\1bar\2/'

Let’s understand this one-liner with an example:$ echo "this is foo and another foo quux" | sed 's/\(.*\)foo\(.*foo\)/\1bar\2/'this is bar and another foo quux

As you can see, this one liner replaced the first “foo” with “bar”.

It did it by using two capturing groups. The first capturing group caught everything before the first “foo”. In this example it was text “this is “. The second group caught everything after the first “foo”, including the second “foo”. In this example ” and another foo”. The matched text was then replaced with contents of first group “this is ” followed by “bar” and contents of second group ” and another foo”. Since ” quux” was not part of the match it was left unchanged. Joining these parts the resulting string is “this is bar and another foo quux”, which is exactly what we got from running the one-liner.

32. Substitute (find and replace) only the last occurrence of “foo” with “bar”.sed 's/\(.*\)foo/\1bar/'

This one-liner uses a capturing group that captures everything up to “foo”. It replaces the captured group and “foo” with captured group itself (the \1 back-reference) and “bar”. It results in the last occurrence of “foo” getting replaced with “bar”.

33. Substitute all occurrences of “foo” with “bar” on all lines that contain “baz”.sed '/baz/s/foo/bar/g'

This one-liner uses a regular expression to restrict the substitution to lines matching “baz”. The lines that do not match “baz” get simply printed out, but those that do match “baz” get the substitution applied.

34. Substitute all occurrences of “foo” with “bar” on all lines that DO NOT contain “baz”.sed '/baz/!s/foo/bar/g'

Sed commands can be inverted and applied on lines that DO NOT match a certain pattern. The exclamation “!” before a sed commands does it. In this one-liner the substitution command is applied to the lines that DO NOT match “baz”.

35. Change text “scarlet”, “ruby” or “puce” to “red”.sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'

This one-liner just uses three consecutive substitution commands. The first replaces “scarlet” with “red”, the second replaced “ruby” with “red” and the last one replaces “puce” with “red”.

If you are using GNU sed, then you can do it simpler:gsed 's/scarlet\|ruby\|puce/red/g'

GNU sed provides more advanced regular expressions which support alternation. This one-liner uses alternation and the substitute command reads “replace ’scarlet’ OR ‘ruby’ OR ‘puce’ with ‘red’”.

Page 8: Unix Awk Sed

36. Reverse order of lines (emulate “tac” Unix command).sed '1!G;h;$!d'

This one-liner acts as the “tac” Unix utility. It’s tricky to explain. The easiest way to explain it is by using an example.

Let’s use a file with just 3 lines:$ cat filefoobarbaz

Running this one-liner on this file produces the file in reverse order:$ sed '1!G;h;$!d' filebazbarfoo

The first one-liner’s command “1!G” gets applied to all the lines which are not the first line. The second command “h” gets applied to all lines. The third command “$!d” gets applied to all lines except the last one.

Let’s go through the execution line by line.

Line 1: Only the “h” command gets applied for the first line “foo”. It copies this line to hold buffer. Hold buffer now contains “foo”. Nothing gets output as the “d” command gets applied.Line 2: The “G” command gets applied. It appends the contents of hold buffer to pattern space. The pattern space now contains. “bar\nfoo”. The “h” command gets applied, it copies “bar\nfoo” to hold buffer. It now contains “bar\nfoo”. Nothing gets output.Line 3: The “G” command gets applied. It appends hold buffer to the third line. The pattern space now contains “baz\nbar\nfoo”. As this was the last line, “d” does not get applied and the contents of pattern space gets printed. It’s “baz\nbar\nfoo”. File got reversed.

If we had had more lines, they would have simply get appended to hold buffer in reverse order.

Here is another way to do the same:sed -n '1!G;h;$p'

It silences the output with “-n” switch and forces the output with “p” command only at the last line.

These two one-liners actually use a lot of memory because they keep the whole file in hold buffer in reverse order before printing it out. Avoid these one-liners for large files.

37. Reverse a line (emulates “rev” Unix command).sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

This is a very complicated one-liner. I had trouble understanding it the first time I saw it and ended up asking on comp.unix.shell for help.

Let’s re-format this sed one-liner: sed ' /\n/ !G s/\(.\)\(.*\n\)/&\2\1/

Page 9: Unix Awk Sed

//D s/.// '

The first line “/\n/ !G” appends a newline to the end of the pattern space if there was none.

The second line “s/\(.\)\(.*\n\)/&\2\1/” is a simple s/// expression which groups the first character as \1 and all the others as \2. Then it replaces the whole matched string with “&\2\1″, where “&” is the whole matched text (”\1\2″). For example, if the input string is “1234″ then after the s/// expression, it becomes “1234\n234\n1″.

The third line is “//D”. This statement is the key in this one-liner. An empty pattern // matches the last existing regex, so it’s exactly the same as: /\(.\)\(.*\n\)/D. The “D” command deletes from the start of the input till the first newline and then resumes editing with first command in script. It creates a loop. As long as /\(.\)\(.*\n\)/ is satisfied, sed will resume all previous operations. After several loops, the text in the pattern space becomes “\n4321″. Then /\(.\)\(.*\n\)/ fails and sed goes to the next command.

The fourth line “s/.//” removes the first character in the pattern space which is the newline char. The contents in pattern space becomes “4321″ — reverse of “1234″.

There you have it, a line has been reversed.

38. Join pairs of lines side-by-side (emulates “paste” Unix command).sed '$!N;s/\n/ /'

This one-liner joins two consecutive lines with the “N” command. They get joined with a “\n” character between them. The substitute command replaces this newline with a space, thus joining every pair of lines with a whitespace.

39. Append a line to the next if it ends with a backslash “\”.sed -e :a -e '/\\$/N; s/\\\n//; ta'

The first expression ‘:a’ creates a named label “a”. The second expression looks to see if the current line ends with a backslash “\”. If it does, it joins it with the line following it using the “N” command. Then the slash and the newline between joined lines get erased with “s/\\\n//” command. If the substitution was successful we branch to the beginning of expression and do the same again, in hope that we might have another backslash. If the substitution was not successful, the line did not end with a backslash and we print it out.

Here is an example of running this one-liner:$ cat filenameline one \line twoline three$ sed -e :a -e '/\\$/N; s/\\\n//; ta' filenameline one line twoline three

Lines one and two got joined because the first line ended with backslash.

40. Append a line to the previous if it starts with an equal sign “=”.sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

This one-liner also starts with creating a named label “a”. Then it tests to see if it is not the last line and

Page 10: Unix Awk Sed

appends the next line to the current one with “N” command. If the just appended line starts with a “=”, one-liner branches the label “a” to see if there are more lines starting with “=”. During this process a substitution gets executed which throws away the newline character which came from joining with “N” and the “=”. If the substitution fails, one-liner prints out the pattern space up to the newline character with the “P” command, and deletes the contents of pattern space up to the newline character with “D” command, and repeats the process.

Here is an example of running it:$ cat filenameline one=line two=line threeline four$ sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' filenameline one line two line threeline four

Lines one, two and three got joined, because lines two and three started with ‘=’. Line four got printed as-is.

41. Digit group (commify) a numeric string.sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'

This one-liner turns a string of digits, such as “1234567″ to “1,234,567″. This is called commifying or digit grouping.

First the one-liner creates a named label “a”. Then it captures two groups of digits. The first group is all the digits up to last three digits. The last three digits gets captures in the 2nd group. Then the two matching groups get separated by a comma. Then the same rules get applied to the line again and again until all the numbers have been grouped in groups of three.

Substitution command “\1,\2″ separates contents of group one with a comma from the contents of group two.

Here is an example to understand the grouping happening here better. Suppose you have a numeric string “1234567″. The first group captures all the numbers until the last three “1234″. The second group captures last three numbers “567″. They get joined by a comma. Now the string is “1234,567″. The same stuff is applied to the string again. Number “1″ gets captured in the first group and the numbers “234″ in the second. The number string is “1,234,567″. Trying to apply the same rules again fail because there is just one digit at the beginning of string, so the string gets printed out and sed moves on to the next line.

If you have GNU sed, you can use a simpler one-liner:gsed ':a;s/\B[0-9]\{3\}\>/,&/;ta'

This one-liner starts with creating a named label “a” and then loops over the string the same way as the previous one-liner did. The only difference is how groups of three digits get matched. GNU sed has some additional patterns. There are two patterns that make this one-liner work. The first is “\B”, which matches anywhere except at a word boundary. It’s needed so we did not go beyond word boundary. Look at this example:$ echo "12345 1234 123" | sed 's/[0-9]\{3\}\>/,&/g'12,345 1,234 ,123

Page 11: Unix Awk Sed

It’s clearly wrong. The last 123 got a comma added. Adding the “\B” makes sure we match the numbers only at word boundary:$ echo "12345 1234 123" | sed 's/\B[0-9]\{3\}\>/,&/g'12,345 1,234 123

The second is “\>”. It matches the null string at the end of a word. It’s necessary because we need to to match the right-most three digits. If we did not have it, the expression would match after the first digit.

42. Add commas to numbers with decimal points and minus signs.gsed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta'

This one-liner works in GNU sed only. It turns on extended regular expression support with the “-r” switch. Then it loops over a line matching three groups and separates the first two from the third with a comma.

The first group makes sure we ignore a leading non-digit character, such as + or -. If there is no leading non-digit character, then it just anchors at the beginning of the string which always matches.

The second group matches a bunch of numbers. The third group makes sure the second group does not match too many. It matches 3 consecutive numbers at the end of the string.

Once the groups have been captured, the “\1\2,\3″ substitution is done and the expression is looped again, until the whole string has been commified.

43. Add a blank line after every five lines.sed 'n;n;n;n;G;'

The “n” command is called four times in this one-liner. Each time it’s called it prints out the current pattern space, empties it and reads in the next line of input. After calling it four times, the fifth line is read into the pattern space and then the “G” command gets called. The “G” command appends a newline to the fifth line. Then the next round of four “n” commands is done. Next time the first “n” command is called it prints out the newlined fifth line, thus inserting a blank line after every 5 lines.

The same can be achieved with GNU sed’s step extension:gsed '0~5G'

GNU sed’s step extensions can be generalized as “first~step”. It matches every “step”‘th line starting with line “first”. In this one-liner it matches every 5th line starting with line 0.

44. Print the first 10 lines of a file (emulates “head -10″).sed 10q

This one-liner restricts the “q” (quit) command to line “10″. It means that this command gets executed only when sed reads the 10th line. For all the other lines there is no command specified. When there is no command specified, the default action is to print the line as-is. This one-liner prints lines 1-9 unmodified and at 10th line quits. Notice something strange? It was supposed to print first 10 lines of a file, but it seems that it just printed only the first 9… Worry not! The quit command is sneaky in its nature. Upon quitting with “q” command, sed actually prints the contents of pattern space and only then quits. As a result lines 1-10 get printed!

Page 12: Unix Awk Sed

Please see the first part of the article for explanation of “pattern space”.

45. Print the first line of a file (emulates “head -1″).sed q

The explanation of this one-liner is almost the same as of the previous. Sed quits and prints the first line.

A more detailed explanation - after the first line has been placed in the pattern space, sed executes the “q” command. This command forces sed to quit; but due to strange nature of the “q” command, sed also prints the contents of pattern space. As a result, only the first line gets printed.

46. Print the last 10 lines of a file (emulates “tail -10″).sed -e :a -e '$q;N;11,$D;ba'

This one-liner is tricky to explain. It always keeps the last 10 lines in pattern space and at the very last line of input it quits and prints them.

I’ll try to explain it. The first “-e :a” creates a label called “a”. The second “-e” does the following: “$q” - if it is the last line, quit and print the pattern space. If it is not the last line, execute three commands “N”, “11,$D” and “ba”. The “N” command reads the next line of input and appends it to the pattern space. The line gets separated from the rest of the pattern space by a new line character. The “11,$D” command executes the “D” command if the current line number is greater than or equal to 11 (”11,$” means from 11th line to end of file). The “D” command deletes the portion of pattern space up to the first new line character. The last command “ba” branches to a label named “a” (beginning of script). This guarantees that the pattern space never contains more than 10 lines, because as line 11 gets appended to pattern space, line 1 gets deleted, as line 12 gets appended line 2 gets deleted, etc.

47. Print the last 2 lines of a file (emulates “tail -2″).sed '$!N;$!D'

This one-liner is also tricky. First of all, the “$!” address restricts commands “N” and “D” to all the lines except the last line.

Notice how the addresses can be negated. If “$<command>” restricts a command to the last line, then “$!<command>” restricts the command to all but the last line. This can be applied to all restriction operations.

In this one-liner the “N” command reads the next line from input and appends it to pattern space. The “D” command deletes everything in pattern space up to the first “\n” symbol. These two commands always keep only the most recently read line in pattern space. When processing the second-to-last line, “N” gets executed and appends the last line to the pattern space. The “D” does not get executed as “N” consumed the last line. At this moment sed quits and prints out the last two lines of the file.

48. Print the last line of a file (emulates “tail -1″).sed '$!d'

This one-liner discards all the lines except the last one. The “d” command deletes the current pattern space, reads in the next line, and restarts the execution of commands from the first. In this case it just loops over itself like “dddd…ddd” until it hits the last line. At the last line no command is executed (”$!d” restricted execution of “d” to all the lines but last) and the pattern space gets printed.

Another way to do the same:

Page 13: Unix Awk Sed

sed -n '$p'

The “-n” parameter suppresses automatic printing of pattern space. It means that without an explicit “p” command (or other commands that act directly on the output stream), sed is dead silent. The “p” command stands for “print” and it prints the pattern space. This one-liner calls the “p” command at the very last line of input. All the other lines are silently discarded.

49. Print next-to-the-last line of a file.Eric gives three different one-liners to do this. The first one prints a blank line if the file contains just 1 line:sed -e '$!{h;d;}' -e x

This one-liner executes the “h;d” commands for all the lines except the last one (”$!” restricts “h;d” commands to all lines except last). The “h” command puts the current line in hold buffer and “d” deletes the current line, and starts execution at the first sed command (”h;d” gets executed again, and again, …). At every single line, that line gets copied to hold buffer. At the very last line “h;d” does not get executed. At this moment “x” gets a chance to execute. The “x” command exchanges the contents of hold buffer with pattern space. Remember that the previous line is still in the hold buffer. The “x” command puts it back in pattern space, and sed prints it! There you go, the next-to-last line was printed!

In case there is just 1 line in the file, only the “x” command gets executed. As the hold buffer initially is empty, “x” puts emptiness in pattern space (I use word “put” here but it actually exchanges the pattern space with hold space). Now sed prints the contents of pattern space, but it’s empty, so sed prints out just a blank line.

The second prints the first line if the file contains just 1 line:sed -e '1{$q;}' -e '$!{h;d;}' -e x

This sed-one liner is divided in two parts. The first part “1{$q;}” handles the case when the file contains just a single line. The second part “$!{h;d;} x” is exactly the same as in the previous one-liner! Thus, I need to explain just the first part.

The first part says - if it is the first line “1″, then execute “$q”. The “$q” command means - if it is the last line, then quit. What it effectively does is it quits if the first line is the last line (i.e. file contains just one line). Remember from one-liner #44 that before quitting sed prints the contents of pattern space. As a result, if the file contains just one line, sed prints it.

The third prints nothing for 1 line files:sed -e '1{$d;}' -e '$!{h;d;}' -e x

This one-liner is again divided in two parts. The first part is “1{$d;}” and the second is exactly the same as in the previous two one-liners. I will explain just the first part.

The first part says - if it is the first line “1″, then execute “$d”. The “$d” command means - if it is the last line, then delete the pattern space and start all over again. In case the first line is the last (only one line in file), there is nothing more to be done and sed quits, printing nothing.

50. Print only the lines that match a regular expression (emulates “grep”).sed -n '/regexp/p'

Page 14: Unix Awk Sed

This one-liner suppresses automatic printing of pattern space with the “-n” switch and makes use of “p” command to print only the lines that match “/regexp/”. The lines that do not match this regex get silently discarded. The ones that match get printed. That’s it.

Another one-liner that does the same:sed '/regexp/!d'

This one-liner deletes all the lines that do not match “/regexp/”. The other lines get printed by default. The “!” before “d” command inverts the line matching.

51. Print only the lines that do not match a regular expression (emulates “grep -v”).sed -n '/regexp/!p'

This one-liner is the inverse of the previous.

The “-n” prevents automatic printing of pattern space. The “/regexp/” restricts the “!p” command only to lines that match “/regexp/”, but the “!” switch prevents “p” from acting on these lines. What happens is “p” acts on all lines that do not match “/regexp/”, and they get “p”rinted.sed '/regexp/d'

This one-liner is the inverse of the previous (#50).

This one-liner executed the “d” (delete) command on all lines that match “/regexp/”, thus leaving only the lines that do not match. They get printed automatically.

52. Print the line immediately before regexp, but not the line containing the regexp.sed -n '/regexp/{g;1!p;};h'

This one-liner saves each line in hold buffer with “h” command. If a line matches the regexp, the hold buffer (containing the previous line) gets copied to pattern space with “g” command and the pattern space gets printed out with “p” command. The “1!” restricts “p” not to print on the first line (as there are no lines before the first).

53. Print the line immediately after regexp, but not the line containing the regexp.sed -n '/regexp/{n;p;}'

First of all, this one-liner disables automatic printing of pattern space with “-n” command line argument. Then, for all the lines that match “/regexp/”, this one-liner executes “n” and “p” commands. The “n” command is the only command that depends on “-n” flag explicitly. If “-n” is specified it will empty the current pattern space and read in the next line of input. If “-n” is not specified, it will print out the current pattern space before emptying it. As in this one-liner “-n” is specified, the “n” command empties the pattern space, reads in the next line and then the “p” command prints that line out.

54. Print one line before and after regexp. Also print the line matching regexp and its line number. (emulates “grep -A1 -B1″).sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

First let’s look at “h” command at the end of script. It gets executed on every line and stores the current line in pattern space in hold buffer. The idea of storing the current line in hold buffer is that if the next line matches “/regexp/” then the previous line is available in hold buffer.

Now let’s look at the complicated “/regexp/{=;x;1!p;g;$!N;p;D;}” command. It gets executed only if

Page 15: Unix Awk Sed

the line matches “/regexp/”. The first thing it does is it prints the current line number with “=” command. Then, it exchanges the hold buffer with pattern space by using the “x” command. As I explained, the “h” command at the end of the script makes sure that the hold buffer always contains the previous line. Now we have put it in the pattern space with “x” command. Next, if it’s not the first line, “1!p” prints the pattern space, effectively printing the previous line. Now the “g” command gets executed. It copies the original line that was just exchanged with hold buffer back to pattern space. Now the “$!N” executes. If it is not the last line, “N” appends the next line to the current pattern space (and separates them with “\n” char). Pattern space now contains the line that matched “/regexp/” and the next line. The “p” command prints that. “D” deletes the current line (line that matched “/regexp/”) from pattern space and finally “h” gets executed again, that puts the contents of pattern space into hold buffer. As “D” deleted the current line, the next line was put in hold buffer.

55. Grep for “AAA” and “BBB” and “CCC” in any order.sed '/AAA/!d; /BBB/!d; /CCC/!d'

This one-liner inverts the “d” command to be executed on lines that do not contain either “AAA”, “BBB” or “CCC”. If a line does not contain one of them, it gets deleted and sed proceeds to the next line. Only if all three of the patterns are present, does the sed print the line.

56. Grep for “AAA” and “BBB” and “CCC” in that order.sed '/AAA.*BBB.*CCC/!d'

This one-liner deletes lines that do not match regexp “/AAA.*BBB.*CCC/”. For example, a line “AAAfooBBBbarCCC” will get printed but “AAAfooCCCbarBBB” baz will not.

It can also be written as:sed -n '/AAA.*BBB.*CCC/p'

This one-liner prints lines that contain AAA…BBB…CCC in that order.

57. Grep for “AAA” or “BBB”, or “CCC”.sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

This one-liner uses the “b” command to branch to the end of the script if the line matches “AAA” or “BBB” or “CCC”. At the end of the script the line gets implicitly printed. If the line does not match “AAA” or “BBB” or “CCC”, the script reaches the “d” command that deletes the line.gsed '/AAA\|BBB\|CCC/!d'

This one-liner works with GNU sed. GNU sed allows alternation operator | to be used to match separate things. It’s a more compact way of saying match “AAA” or “BBB”, or “CCC”.

If you are using GNU sed, then there is actually no need to escape the pipes |. You may specify the “-r” command line option to use extended regular expressions. This way this one liner becomes:gsed -r '/AAA|BBB|CCC/!d'

orgsed -rn '/AAA|BBB|CCC/p'

58. Print a paragraph that contains “AAA”. (Paragraphs are separated by blank lines).

Page 16: Unix Awk Sed

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

First notice that this one-liner is divided in two parts for clearness. The first part is “/./{H;$!d;}” and the second part is “x;/AAA/!d”.

The first part has an interesting pattern match “/./”. What do you think it does? Well, a line separating paragraphs would be a blank line, meaning it would not have any characters in it. This pattern matches only the lines that are not separating paragraphs. These lines get appended to hold buffer with “H” command. They also get prevented from printing with “d” command (except for the last line, when “d” does not get executed (”$!” restricts “d” to all but the last line)). Once sed sees a blank line, the “/./” pattern no longer matches and the second part of one-liner gets executed.

The second part exchanges the hold buffer with pattern space by using the “x” command. The pattern space now contains the whole paragraph of text. Next sed tests if the paragraph contains “AAA”. If it does, sed does nothing which results in printing the paragraph. If the paragraph does not contain “AAA”, sed executes the “d” command that deletes it without printing and restarts execution at first command.

59. Print a paragraph if it contains “AAA” and “BBB” and “CCC” in any order.sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

This one-liner is also split in two parts for clarity. The first part is exactly the same as the first part of previous one-liner. The second part is very similar to one-liner #55 and also the previous.

The “x” command in the 2nd part does exactly the same as in previous one-liner, it exchanges the hold buffer, that contains the paragraph with pattern space. Next sed does three tests - it tests if the paragraph contains “AAA”, “BBB” and “CCC”. If the paragraph does not contain even one of them, the “d” command gets executed that purges the paragraph. If it contains all three patterns, sed happily prints the paragraph.

60. Print a paragraph if it contains “AAA” or “BBB” or “CCC”.sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

The first part is exactly the same as in previous two one-liners and does not require explanation. The second part that happens to be “-e ‘x;/AAA/b’ -e ‘/BBB/b’ -e ‘/CCC/b’ -e d” is almost exactly the same as in one-liner #57.

The “x” command exchanges the paragraph stored in hold buffer with the pattern space. Then it tests if the pattern space (paragraph) contains “AAA”, if it does, sed branches to end of script with “b” command, that happily makes sed print the paragraph. If “AAA” did not match, sed does exactly the same testing for pattern “BBB”. If it again did not match, it tests for “CCC”. If none of these patterns were found, sed executes the “d” command that deletes everything and restarts this one-liner.

Here is another way to do the same with GNU sed:gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d'

This one-liner is exactly the same as previous one. It just compresses the three tests for “AAA”, “BBB” or “CCC” into one “/AAA\|BBB\|CCC/” as explained in one-liner #57.

61. Print only the lines that are 65 characters in length or more.sed -n '/^.\{65\}/p'

Page 17: Unix Awk Sed

This one-liner prints lines that are 65 characters in length or more. It does it by using a regular expression “^.{65}” that matches any 65 characters at the beginning of line. If there are less than 65 characters, the regex does not match and the line does not get printed (as automatic printing was disabled with “-n” command line option).

62. Print only the lines that are less than 65 chars.sed -n '/^.\{65\}/!p'

This one-liner inverts the previous one. If the line matches 65 characters, then it is not printed “!p”. If it does not match, it gets printed.

Another way to do the same:sed '/^.\{65\}/d'

This one-liner deletes all lines that match 65 characters. All others implicitly get printed.

63. Print section of a file from a regex to end of file.sed -n '/regexp/,$p'

This one-liner uses a tricky range match “/regex/,$”. It matches lines starting from the first line that matches “/regex/” to the end of file “$”. The “p” command prints these lines. All other lines get silently discarded.

64. Print lines 8-12 (inclusive) of a file.sed -n '8,12p'

This is another type of range match. This range matches a section of lines between two lines numbers (inclusive). In this case it’s lines [8 to 12].sed '8,12!d'

This is the same one-liner, just written differently. It deletes lines that are outside of range [8, 12] and prints those in this range.

65. Print line number 52.sed -n '52p'

This one-liner restricts the “p” command to line “52″. Only this line gets “p”rinted.sed '52!d'

This one-liner deletes all lines except line 52. Line 52 gets printed.sed '52q;d'

This one is the smartest. It quits at line 52 with “q” command. The previous two one-liners would loop over all the remaining lines and do nothing. Remember from one-liner #44 that quit command prints the pattern space with it. The “d” command makes sure that no other line gets printed while sed gets to line 52.

66. Beginning at line 3, print every 7th line.gsed -n '3~7p'

Page 18: Unix Awk Sed

This one-liner uses a line range match extension of GNU sed. A line range in format “first~step” matches every step’th line starting from first. In this one-liner it’s “3~7″, meaning match every 7th line starting from 3rd. The “-n” flag prevents printing any other lines, and “p” in “3~7p” prints the matched line.

For everyone else, this one-liner works:sed -n '3,${p;n;n;n;n;n;n;}'

This one-liner executes commands “p;n;n;n;n;n;n” for lines starting the 3rd line. The “3,$” is a line range match that restricts commands by line numbers. The “$” means end of file and “3″ means 3rd line.

The “p;n;n;n;n;n;n” command prints the line, then skips 6, prints the 7th, skips 6, prints the 14th, etc. As it starts executing at line 3, the effect is - print line 3, skip 6, print line 10, skip 6, print line 17, …. That is, print every 7th line beginning at 3rd.

67. Print section of lines between two regular expressions (inclusive).sed -n '/Iowa/,/Montana/p'

This one-liner prints all the lines between the first line that matches a regular expression “Iowa” and the first line that matches a regular expression “Montana”.

It uses a range match “/start/,/finish/” that matches all lines starting from a line that matches “start” and ending with the first line that matches “finish”.

Line Spacing1. Double-space a file.awk '1; { print "" }'

So how does it work? A one-liner is an Awk program and every Awk program consists of a sequence of pattern-action statements “pattern { action statements }“. In this case there are two statements "1" and "{ print "" }". In a pattern-action statement either the pattern or the action may be missing. If the pattern is missing, the action is applied to every single line of input. A missing action is equivalent to '{ print }'. Thus, this one-liner translates to:awk '1 { print } { print "" }'

An action is applied only if the pattern matches, i.e., pattern is true. Since '1' is always true, this one-liner translates further into two print statements:awk '{ print } { print "" }'

Every print statement in Awk is silently followed by an ORS - Output Record Separator variable, which is a newline by default. The first print statement with no arguments is equivalent to "print $0", where $0 is a variable holding the entire line. The second print statement prints nothing, but knowing that each print statement is followed by ORS, it actually prints a newline. So there we have it, each line gets double-spaced.

2. Another way to double-space a file.

Page 19: Unix Awk Sed

awk 'BEGIN { ORS="\n\n" }; 1'

BEGIN is a special kind of pattern which is not tested against the input. It is executed before any input is read. This one-liner double-spaces the file by setting the ORS variable to two newlines. As I mentioned previously, statement "1" gets translated to "{ print }", and every print statement gets terminated with the value of ORS variable.

3. Double-space a file so that no more than one blank line appears between lines of text.awk 'NF { print $0 "\n" }'

The one-liner uses another special variable called NF - Number of Fields. It contains the number of fields the current line was split into. For example, a line “this is a test” splits in four pieces and NF gets set to 4. The empty line “” does not split into any pieces and NF gets set to 0. Using NF as a pattern can effectively filter out empty lines. This one liner says: “If there are any number of fields, print the whole line followed by newline.”

4. Triple-space a file.awk '1; { print "\n" }'

This one-liner is very similar to previous ones. '1' gets translated into '{ print }' and the resulting Awk program is:awk '{ print; print "\n" }'

It prints the line, then prints a newline followed by terminating ORS, which is newline by default.

2. Numbering and Calculations5. Number lines in each file separately.awk '{ print FNR "\t" $0 }'

This Awk program appends the FNR - File Line Number predefined variable and a tab (\t) before each line. FNR variable contains the current line for each file separately. For example, if this one-liner was called on two files, one containing 10 lines, and the other 12, it would number lines in the first file from 1 to 10, and then resume numbering from one for the second file and number lines in this file from 1 to 12. FNR gets reset from file to file.

6. Number lines for all files together.awk '{ print NR "\t" $0 }'

This one works the same as #5 except that it uses NR - Line Number variable, which does not get reset from file to file. It counts the input lines seen so far. For example, if it was called on the same two files with 10 and 12 lines, it would number the lines from 1 to 22 (10 + 12).

7. Number lines in a fancy manner.awk '{ printf("%5d : %s\n", NR, $0) }'

This one-liner uses printf() function to number lines in a custom format. It takes format parameter just like a regular printf() function. Note that ORS does not get appended at the end of printf(), so we have to print the newline (\n) character explicitly. This one right-aligns line numbers, followed by a space and a colon, and the line.

Page 20: Unix Awk Sed

8. Number only non-blank lines in files.awk 'NF { $0=++a " :" $0 }; { print }'

Awk variables are dynamic; they come into existence when they are first used. This one-liner pre-increments variable ‘a’ each time the line is non-empty, then it appends the value of this variable to the beginning of line and prints it out.

9. Count lines in files (emulates wc -l).awk 'END { print NR }'

END is another special kind of pattern which is not tested against the input. It is executed when all the input has been exhausted. This one-liner outputs the value of NR special variable after all the input has been consumed. NR contains total number of lines seen (= number of lines in the file).

10. Print the sum of fields in every line.awk '{ s = 0; for (i = 1; i <= NF; i++) s = s+$i; print s }'

Awk has some features of C language, like the for (;;) { … } loop. This one-liner loops over all fields in a line (there are NF fields in a line), and adds the result in variable ’s’. Then it prints the result out and proceeds to the next line.

11. Print the sum of fields in all lines.awk '{ for (i = 1; i <= NF; i++) s = s+$i }; END { print s+0 }'

This one-liner is basically the same as #10, except that it prints the sum of all fields. Notice how it did not initialize variable ’s’ to 0. It was not necessary as variables come into existence dynamically. Also notice how it calls “print s+0″ and not just “print s”. It is necessary if there are no fields. If there are no fields, “s” never comes into existence and is undefined. Printing an undefined value does not print anything (i.e. prints just the ORS). Adding a 0 does a mathematical operation and undef+0 = 0, so it prints “0″.

12. Replace every field by its absolute value.awk '{ for (i = 1; i <= NF; i++) if ($i < 0) $i = -$i; print }'

This one-liner uses two other features of C language, namely the if (…) { … } statement and omission of curly braces. It loops over all fields in a line and checks if any of the fields is less than 0. If any of the fields is less than 0, then it just negates the field to make it positive. Fields can be addresses indirectly by a variable. For example, i = 5; $i = 'hello', sets field number 5 to string 'hello'.

Here is the same one-liner rewritten with curly braces for clarity. The 'print' statement gets executed after all the fields in the line have been replaced by their absolute values.awk '{ for (i = 1; i <= NF; i++) { if ($i < 0) { $i = -$i; } } print}'

13. Count the total number of fields (words) in a file.

Page 21: Unix Awk Sed

awk '{ total = total + NF }; END { print total+0 }'

This one-liner matches all the lines and keeps adding the number of fields in each line. The number of fields seen so far is kept in a variable named ‘total’. Once the input has been processed, special pattern 'END { … }' is executed, which prints the total number of fields. See 11th one-liner for explanation of why we “print total+0″ in the END block.

14. Print the total number of lines containing word “Beth”.awk '/Beth/ { n++ }; END { print n+0 }'

This one-liner has two pattern-action statements. The first one is '/Beth/ { n++ }'. A pattern between two slashes is a regular expression. It matches all lines containing pattern “Beth” (not necessarily the word “Beth”, it could as well be “Bethe” or “theBeth333″). When a line matches, variable ‘n’ gets incremented by one. The second pattern-action statement is 'END { print n+0 }'. It is executed when the file has been processed. Note the '+0' in 'print n+0' statement. It forces '0' to be printed in case there were no matches (’n’ was undefined). Had we not put '+0' there, an empty line would have been printed.

15. Find the line containing the largest (numeric) first field.awk '$1 > max { max=$1; maxline=$0 }; END { print max, maxline }'

This one-liner keeps track of the largest number in the first field (in variable ‘max’) and the corresponding line (in variable ‘maxline’). Once it has looped over all lines, it prints them out. Warning: this one-liner does not work if all the values are negative.

Here is the fix:awk 'NR == 1 { max = $1; maxline = $0; next; } $1 > max { max=$1; maxline=$0 }; END { print max, maxline }'

16. Print the number of fields in each line, followed by the line.awk '{ print NF ":" $0 } '

This one-liner just prints out the predefined variable NF - Number of Fields, which contains the number of fields in the line, followed by a colon and the line itself.

17. Print the last field of each line.awk '{ print $NF }'

Fields in Awk need not be referenced by constants. For example, code like 'f = 3; print $f' would print out the 3rd field. This one-liner prints the field with the value of NF. $NF is last field in the line.

18. Print the last field of the last line.awk '{ field = $NF }; END { print field }'

This one-liner keeps track of the last field in variable ‘field’. Once it has looped all the lines, variable ‘field’ contains the last field of the last line, and it just prints it out.

Here is a better version of the same one-liner. It’s more common, idiomatic and efficient:awk 'END { print $NF }'

Page 22: Unix Awk Sed

19. Print every line with more than 4 fields.awk 'NF > 4'

This one-liner omits the action statement. As I noted in one-liner #1, a missing action statement is equivalent to '{ print }'.

20. Print every line where the value of the last field is greater than 4.awk '$NF > 4'

This one-liner is similar to #17. It references the last field by NF variable. If it’s greater than 4, it prints it out.

Text Conversion and Substitution21. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Unix.awk '{ sub(/\r$/,""); print }'

This one-liner uses the sub(regex, repl, [string]) function. This function substitutes the first instance of regular expression “regex” in string “string” with the string “repl”. If “string” is omitted, variable $0 is used. Variable $0, as I explained in the first part of the article, contains the entire line.

The one-liner replaces ‘\r’ (CR) character at the end of the line with nothing, i.e., erases CR at the end. Print statement prints out the line and appends ORS variable, which is ‘\n’ by default. Thus, a line ending with CRLF has been converted to a line ending with LF.

22. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Unix.awk '{ sub(/$/,"\r"); print }'

This one-liner also uses the sub() function. This time it replaces the zero-width anchor ‘$’ at the end of the line with a ‘\r’ (CR char). This substitution actually adds a CR character to the end of the line. After doing that Awk prints out the line and appends the ORS, making the line terminate with CRLF.

23. Convert Unix newlines (LF) to Windows/DOS newlines (CRLF) from Windows/DOS.awk 1

This one-liner may work, or it may not. It depends on the implementation. If the implementation catches the Unix newlines in the file, then it will read the file line by line correctly and output the lines terminated with CRLF. If it does not understand Unix LF’s in the file, then it will print the whole file and terminate it with CRLF (single windows newline at the end of the whole file).

Ps. Statement '1' (or anything that evaluates to true) in Awk is syntactic sugar for ‘{ print }’.

24. Convert Windows/DOS newlines (CRLF) to Unix newlines (LF) from Windows/DOSgawk -v BINMODE="w" '1'

Theoretically this one-liner should convert CRLFs to LFs on DOS. There is a note in GNU Awk documentation that says: “Under DOS, gawk (and many other text programs) silently translates end-of-line “\r\n” to “\n” on input and “\n” to “\r\n” on output. A special “BINMODE” variable allows control

Page 23: Unix Awk Sed

over these translations and is interpreted as follows: … If “BINMODE” is “w”, then binary mode is set on write (i.e., no translations on writes).”

My tests revealed that no translation was done, so you can’t rely on this BINMODE hack.

Eric suggests to better use the “tr” utility to convert CRLFs to LFs on Windows:tr -d \r

The ‘tr’ program is used for translating one set of characters to another. Specifying -d option makes it delete all characters and not do any translation. In this case it’s the ‘\r’ (CR) character that gets erased from the input. Thus, CRLFs become just LFs.

25. Delete leading whitespace (spaces and tabs) from the beginning of each line (ltrim).awk '{ sub(/^[ \t]+/, ""); print }'

This one-liner also uses sub() function. What it does is replace regular expression “^[ \t]+” with nothing “”. The regular expression “^[ \t]+” means - match one or more space ” ” or a tab “\t” at the beginning “^” of the string.

26. Delete trailing whitespace (spaces and tabs) from the end of each line (rtrim).awk '{ sub(/[ \t]+$/, ""); print }'

This one-liner is very similar to the previous one. It replaces regular expression “[ \t]+$” with nothing. The regular expression “[ \t]+$” means - match one or more space ” ” or a tab “\t” at the end “$” of the string. The “+” means “one or more”.

27. Delete both leading and trailing whitespaces from each line (trim).awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

This one-liner uses a new function called “gsub”. Gsub() does the same as sub(), except it performs as many substitutions as possible (that is, it’s a global sub()). For example, given a variable f = “foo”, sub("o", "x", f) would replace just one “o” in variable f with “x”, making f be “fxo”; but gsub("o", "x", f) would replace both “o”s in “foo” resulting “fxx”.

The one-liner combines both previous one-liners - it replaces leading whitespace “^[ \t]+” and trailing whitespace “[ \t]+$” with nothing, thus trimming the string.

To remove whitespace between fields you may use this one-liner:awk '{ $1=$1; print }'

This is a pretty tricky one-liner. It seems to do nothing, right? Assign $1 to $1. But no, when you change a field, Awk rebuilds the $0 variable. It takes all the fields and concats them, separated by OFS (single space by default). All the whitespace between fields is gone.

28. Insert 5 blank spaces at beginning of each line.awk '{ sub(/^/, " "); print }'

This one-liner substitutes the zero-length beginning of line anchor “^” with five empty spaces. As the anchor is zero-length and matches the beginning of line, the five whitespace characters get appended to beginning of the line.

29. Align all text flush right on a 79-column width.

Page 24: Unix Awk Sed

awk '{ printf "%79s\n", $0 }'

This one-liner asks printf() to print the string in $0 variable and left pad it with spaces until the total length is 79 chars.

Please see the documentation of printf function for more information and examples.

30. Center all text on a 79-character width.awk '{ l=length(); s=int((79-l)/2); printf "%"(s+l)"s\n", $0 }'

First this one-liner calculates the length() of the line and puts the result in variable “l”. Length(var) function returns the string length of var. If the variable is not specified, it returns the length of the entire line (variable $0). Next it calculates how many white space characters to pad the line with and stores the result in variable “s”. Finally it printf()s the line with appropriate number of whitespace chars.

For example, when printing a string “foo”, it first calculates the length of “foo” which is 3. Next it calculates the column “foo” should appear which (79-3)/2 = 38. Finally it printf(”%41″, “foo”). Printf() function outputs 38 spaces and then “foo”, making that string centered (38*2 + 3 = 79)

31. Substitute (find and replace) “foo” with “bar” on each line.awk '{ sub(/foo/,"bar"); print }'

This one-liner is very similar to the others we have seen before. It uses the sub() function to replace “foo” with “bar”. Please note that it replaces just the first match. To replace all “foo”s with “bar”s use the gsub() function:awk '{ gsub(/foo/,"bar"); print }'

Another way is to use the gensub() function:gawk '{ $0 = gensub(/foo/,"bar",4); print }'

This one-liner replaces only the 4th match of “foo” with “bar”. It uses a never before seen gensub() function. The prototype of this function is gensub(regex, s, h[, t]). It searches the string “t” for “regex” and replaces “h”-th match with “s”. If “t” is not given, $0 is assumed. Unlike sub() and gsub() it returns the modified string “t” (sub and gsub modified the string in-place).

Gensub() is a non-standard function and requires GNU Awk or Awk included in NetBSD.

In this one-liner regex = “/foo/”, s = “bar”, h = 4, and t = $0. It replaces the 4th instance of “foo” with “bar” and assigns the new string back to the whole line $0.

32. Substitute “foo” with “bar” only on lines that contain “baz”.awk '/baz/ { gsub(/foo/, "bar") }; { print }'

As I explained in the first one-liner in the first part of the article, every Awk program consists of a sequence of pattern-action statements “pattern { action statements }”. Action statements are applied only to lines that match pattern.

In this one-liner the pattern is a regular expression /baz/. If line contains “baz”, the action statement gsub(/foo/, "bar") is executed. And as we have learned, it substitutes all instances of “foo” with “bar”. If you want to substitute just one, use the sub() function!

33. Substitute “foo” with “bar” only on lines that do not contain “baz”.

Page 25: Unix Awk Sed

awk '!/baz/ { gsub(/foo/, "bar") }; { print }'

This one-liner negates the pattern /baz/. It works exactly the same way as the previous one, except it operates on lines that do not contain match this pattern.

34. Change “scarlet” or “ruby” or “puce” to “red”.awk '{ gsub(/scarlet|ruby|puce/, "red"); print}'

This one-liner makes use of extended regular expression alternation operator | (pipe). The regular expression /scarlet|ruby|puce/ says: match “scarlet” or “ruby” or “puce”. If the line matches, gsub() replaces all the matches with “red”.

35. Reverse order of lines (emulate “tac”).awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }'

This is the trickiest one-liner today. It starts by recording all the lines in the array “a”. For example, if the input to this program was three lines “foo”, “bar”, and “baz”, then the array “a” would contain the following values: a[0] = “foo”, a[1] = “bar”, and a[2] = “baz”.

When the program has finished processing all lines, Awk executes the END { } block. The END block loops over the elements in the array “a” and prints the recorded lines. In our example with “foo”, “bar”, “baz” the END block does the following:for (j = 2; j >= 0; ) print a[j--]

First it prints out j[2], then j[1] and then j[0]. The output is three separate lines “baz”, “bar” and “foo”. As you can see the input was reversed.

36. Join a line ending with a backslash with the next line.awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

This one-liner uses regular expression “/\\$/” to look for lines ending with a backslash. If the line ends with a backslash, the backslash gets removed by sub(/\\$/,"") function. Then the “getline t” function is executed. “Getline t” reads the next line from input and stores it in variable t. “Print $0 t” statement prints the original line (but with trailing backslash removed) and the newly read line (which was stored in variable t). Awk then continues with the next line. If the line does not end with a backslash, Awk just prints it out with “1″.

Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).

37. Print and sort the login names of all users.awk -F ":" '{ print $1 | "sort" }' /etc/passwd

This is the first time we see the -F argument passed to Awk. This argument specifies a character, a string or a regular expression that will be used to split the line into fields ($1, $2, …). For example, if the line is “foo-bar-baz” and -F is “-”, then the line will be split into three fields: $1 = “foo”, $2 = “bar” and $3 = “baz”. If -F is not set to anything, the line will contain just one field $1 = “foo-bar-baz”.

Specifying -F is the same as setting the FS (Field Separator) variable in the BEGIN block of Awk program:awk -F ":"

Page 26: Unix Awk Sed

# is the same asawk 'BEGIN { FS=":" }'

/etc/passwd is a text file, that contains a list of the system’s accounts, along with some useful information like login name, user ID, group ID, home directory, shell, etc. The entries in the file are separated by a colon “:”.

Here is an example of a line from /etc/passwd file:pkrumins:x:1000:100:Peteris Krumins:/home/pkrumins:/bin/bash

If we split this line on “:”, the first field is the username (pkrumins in this example). The one-liner does just that - it splits the line on “:”, then forks the “sort” program and feeds it all the usernames, one by one. After Awk has finished processing the input, sort program sorts the usernames and outputs them.

38. Print the first two fields in reverse order on each line.awk '{ print $2, $1 }' file

This one liner is obvious. It reverses the order of fields $1 and $2. For example, if the input line is “foo bar”, then after running this program the output will be “bar foo”.

39. Swap first field with second on every line.awk '{ temp = $1; $1 = $2; $2 = temp; print }'

This one-liner uses a temporary variable called “temp”. It assigns the first field $1 to “temp”, then it assigns the second field to the first field and finally it assigns “temp” to $2. This procedure swaps the first two fields on every line. For example, if the input is “foo bar baz”, then the output will be “bar foo baz”.

Ps. This one-liner was incorrect in Eric’s awk1line.txt file. “Print” was missing.

40. Delete the second field on each line.awk '{ $2 = ""; print }'

This one liner just assigns empty string to the second field. It’s gone.

41. Print the fields in reverse order on every line.awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }'

We saw the “NF” variable that stands for Number of Fields in the part one of this article. After processing each line, Awk sets the NF variable to number of fields found on that line.

This one-liner loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1), …, $1. After that it prints a newline character.

42. Remove duplicate, consecutive lines (emulate “uniq”)awk 'a !~ $0; { a = $0 }'

Variables in Awk don’t need to be initialized or declared before they are being used. They come into existence the first time they are used. This one-liner uses variable “a” to keep the last line seen “{ a = $0 }”. Upon reading the next line, it compares if the previous line (in variable “a”) is not the same as the current one “a !~ $0″. If it is not the same, the expression evaluates to 1 (true), and as I explained earlier, any true expression is the same as “{ print }”, so the line gets printed out. Then the program

Page 27: Unix Awk Sed

saves the current line in variable “a” again and the same process continues over and over again.

This one-liner is actually incorrect. It uses a regular expression matching operator “!~”. If the previous line was something like “fooz” and the new one is “foo”, then it won’t get output, even though they are not duplicate lines.

Here is the correct, fixed, one-liner:awk 'a != $0; { a = $0 }'

It compares lines line-wise and not as a regular expression.

43. Remove duplicate, nonconsecutive lines.awk '!a[$0]++'

This one-liner is very idiomatic. It registers the lines seen in the associative-array “a” (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to “{ print }”.

For example, suppose the input is:foobarfoobaz

When Awk sees the first “foo”, it evaluates the expression “!a["foo"]++”. “a["foo"]” is false, but “!a["foo"]” is true - Awk prints out “foo”. Then it increments “a["foo"]” by one with “++” post-increment operator. Array “a” now contains one value “a["foo"] == 1″.

Next Awk sees “bar”, it does exactly the same what it did to “foo” and prints out “bar”. Array “a” now contains two values “a["foo"] == 1″ and “a["bar"] == 1″.

Now Awk sees the second “foo”. This time “a["foo"]” is true, “!a["foo"]” is false and Awk does not print anything! Array “a” still contains two values “a["foo"] == 2″ and “a["bar"] == 1″.

Finally Awk sees “baz” and prints it out because “!a["baz"]” is true. Array “a” now contains three values “a["foo"] == 2″ and “a["bar"] == 1″ and “a["baz"] == 1″.

The output:foobarbaz

Here is another one-liner to do the same. Eric in his one-liners says it’s the most efficient way to do it.awk '!($0 in a) { a[$0]; print }'

It’s basically the same as previous one, except that it uses the ‘in’ operator. Given an array “a”, an expression “foo in a” tests if variable “foo” is in “a”.

Note that an empty statement “a[$0]” creates an element in the array.

44. Concatenate every 5 lines of input with a comma.awk 'ORS=NR%5?",":"\n"'

Page 28: Unix Awk Sed

We saw the ORS variable in part one of the article. This variable gets appended after every line that gets output. In this one-liner it gets changed on every 5th line from a comma to a newline. For lines 1, 2, 3, 4 it’s a comma, for line 5 it’s a newline, for lines 6, 7, 8, 9 it’s a comma, for line 10 a newline, etc.

Selective Printing of Certain Lines45. Print the first 10 lines of a file (emulates “head -10″).awk 'NR < 11'

Awk has a special variable called “NR” that stands for “Number of Lines seen so far in the current file”. After reading each line, Awk increments this variable by one. So for the first line it’s 1, for the second line 2, …, etc. As I explained in the very first one-liner, every Awk program consists of a sequence of pattern-action statements “pattern { action statements }”. The “action statements” part get executed only on those lines that match “pattern” (pattern evaluates to true). In this one-liner the pattern is “NR < 11" and there are no "action statements". The default action in case of missing "action statements" is to print the line as-is (it's equivalent to "{ print $0 }"). The pattern in this one-liner is an expression that tests if the current line number is less than 11. If the line number is less than 11, Awk prints the line. As soon as the line number is 11 or more, the pattern evaluates to false and Awk skips the line.

A much better way to do the same is to quit after seeing the first 10 lines (otherwise we are looping over lines > 10 and doing nothing):awk '1; NR == 10 { exit }'

The “NR == 10 { exit }” part guarantees that as soon as the line number 10 is reached, Awk quits. For lines smaller than 10, Awk evaluates “1″ that is always a true-statement. And as we just learned, true statements without the “action statements” part are equal to “{ print $0 }” that just prints the first ten lines!

46. Print the first line of a file (emulates “head -1″).awk 'NR > 1 { exit }; 1'

This one-liner is very similar to previous one. The “NR > 1″ is true only for lines greater than one, so it does not get executed on the first line. On the first line only the “1″, the true statement, gets executed. It makes Awk print the line and read the next line. Now the “NR” variable is 2, and “NR > 1″ is true. At this moment “{ exit }” gets executed and Awk quits. That’s it. Awk printed just the first line of the file.

47. Print the last 2 lines of a file (emulates “tail -2″).awk '{ y=x "\n" $0; x=$0 }; END { print y }'

Okay, so what does this one do? First of all, notice that “{y=x “\n” $0; x=$0}” action statement group is missing the pattern. When the pattern is missing, Awk executes the statement group for all lines. For the first line, it sets variable “y” to “\nline1″ (because x is not yet defined). For the second line it sets variable “y” to “line1\nline2″. For the third line it sets variable “y” to “line2\nline3″. As you can see, for line N it sets the variable “y” to “lineN-1\nlineN”. Finally, when it reaches EOF, variable “y” contains the last two lines and they get printed via “print y” statement.

Page 29: Unix Awk Sed

Thinking about this one-liner for a second one concludes that it is very ineffective - it reads the whole file line by line just to print out the last two lines! Unfortunately there is no seek() statement in Awk, so you can’t seek to the end-2 lines in the file (that’s what tail does). It’s recommended to use “tail -2″ to print the last 2 lines of a file.

48. Print the last line of a file (emulates “tail -1″).awk 'END { print }'

This one-liner may or may not work. It relies on an assumption that the “$0″ variable that contains the entire line does not get reset after the input has been exhausted. The special “END” pattern gets executed after the input has been exhausted (or “exit” called). In this one-liner the “print” statement is supposed to print “$0″ at EOF, which may or may not have been reset.

It depends on your awk program’s version and implementation, if it will work. Works with GNU Awk for example, but doesn’t seem to work with nawk or xpg4/bin/awk.

The most compatible way to print the last line is:awk '{ rec=$0 } END{ print rec }'

Just like the previous one-liner, it’s computationally expensive to print the last line of the file this way, and “tail -1″ should be the preferred way.

49. Print only the lines that match a regular expression “/regex/” (emulates “grep”).awk '/regex/'

This one-liner uses a regular expression “/regex/” as a pattern. If the current line matches the regex, it evaluates to true, and Awk prints the line (remember that missing action statement is equal to “{ print }” that prints the whole line).

50. Print only the lines that do not match a regular expression “/regex/” (emulates “grep -v”).awk '!/regex/'

Pattern matching expressions can be negated by appending “!” in front of them. If they were to evaluate to true, appending “!” in front makes them evaluate to false, and the other way around. This one-liner inverts the regex match of the previous (#49) one-liner and prints all the lines that do not match the regular expression “/regex/”.

51. Print the line immediately before a line that matches “/regex/” (but not the line that matches itself).awk '/regex/ { print x }; { x=$0 }'

This one-liner always saves the current line in the variable “x”. When it reads in the next line, the previous line is still available in the “x” variable. If that line matches “/regex/”, it prints out the variable x, and as a result, the previous line gets printed.

It does not work, if the first line of the file matches “/regex/”, in that case, we might want to print “match on line 1″, for example:awk '/regex/ { print (x=="" ? "match on line 1" : x) }; { x=$0 }'

This one-liner tests if variable “x” contains something. The only time that x is empty is at very first line. In that case “match on line 1″ gets printed. Otherwise variable “x” gets printed (that as we found

Page 30: Unix Awk Sed

out contains the previous line). Notice that this one-liner uses a ternary operator “foo?bar:baz” that is short for “if foo, then bar, else baz”.

52. Print the line immediately after a line that matches “/regex/” (but not the line that matches itself).awk '/regex/ { getline; print }'

This one-liner calls the “getline” function on all the lines that match “/regex/”. This function sets $0 to the next line (and also updates NF, NR, FNR variables). The “print” statement then prints this next line. As a result, only the line after a line matching “/regex/” gets printed.

If it is the last line that matches “/regex/”, then “getline” actually returns error and does not set $0. In this case the last line gets printed itself.

53. Print lines that match any of “AAA” or “BBB”, or “CCC”.awk '/AAA|BBB|CCC/'

This one-liner uses a feature of extended regular expressions that support the | or alternation meta-character. This meta-character separates “AAA” from “BBB”, and from “CCC”, and tries to match them separately on each line. Only the lines that contain one (or more) of them get matched and printed.

54. Print lines that contain “AAA” and “BBB”, and “CCC” in this order.awk '/AAA.*BBB.*CCC/'

This one-liner uses a regular expression “AAA.*BBB.*CCC” to print lines. This regular expression says, “match lines containing AAA followed by any text, followed by BBB, followed by any text, followed by CCC in this order!” If a line matches, it gets printed.

55. Print only the lines that are 65 characters in length or longer.awk 'length > 64'

This one-liner uses the “length” function. This function is defined as “length([str])” - it returns the length of the string “str”. If none is given, it returns the length of the string in variable $0. For historical reasons, parenthesis () at the end of “length” can be omitted. This one-liner tests if the current line is longer than 64 chars, if it is, the “length > 64″ evaluates to true and line gets printed.

56. Print only the lines that are less than 64 characters in length.awk 'length < 64'

This one-liner is almost byte-by-byte equivalent to the previous one. Here it tests if the length if line less than 64 characters. If it is, Awk prints it out. Otherwise nothing gets printed.

57. Print a section of file from regular expression to end of file.awk '/regex/,0'

This one-liner uses a pattern match in form ‘pattern1, pattern2′ that is called “range pattern”. The 3rd Awk Tip from article “10 Awk Tips, Tricks and Pitfalls” explains this match very carefully. It matches all the lines starting with a line that matches “pattern1″ and continuing until a line matches “pattern2″ (inclusive). In this one-liner “pattern1″ is a regular expression “/regex/” and “pattern2″ is just 0 (false). So this one-liner prints all lines starting from a line that matches “/regex/” continuing to end-of-file

Page 31: Unix Awk Sed

(because 0 is always false, and “pattern2″ never matches).

58. Print lines 8 to 12 (inclusive).awk 'NR==8,NR==12'

This one-liner also uses a range pattern in format “pattern1, pattern2″. The “pattern1″ here is “NR==8″ and “pattern2″ is “NR==12″. The first pattern means “the current line is 8th” and the second pattern means “the current line is 12th”. This one-liner prints lines between these two patterns.

59. Print line number 52.awk 'NR==52'

This one-liner tests to see if current line is number 52. If it is, “NR==52″ evaluates to true and the line gets implicitly printed out (patterns without statements print the line unmodified).

The correct way, though, is to quit after line 52:awk 'NR==52 { print; exit }'

This one-liner forces Awk to quit after line number 52 is printed. It is the correct way to print line 52 because there is nothing else to be done, so why loop over the whole doing nothing.

60. Print section of a file between two regular expressions (inclusive).awk '/Iowa/,/Montana/'

I explained what a range pattern such as “pattern1,pattern2″ does in general in one-liner #57. In this one-liner “pattern1″ is “/Iowa/” and “pattern2″ is “/Montana/”. Both of these patterns are regular expressions. This one-liner prints all the lines starting with a line that matches “Iowa” and ending with a line that matches “Montana” (inclusive).

5. Selective Deletion of Certain LinesThere is just one one-liner in this section.

61. Delete all blank lines from a file.awk NF

This one-liner uses the special NF variable that contains number of fields on the line. For empty lines, NF is 0, that evaluates to false, and false statements do not get the line printed.

Another way to do the same is:awk '/./'

This one-liner uses a regular-expression match “.” that matches any character. Empty lines do not have any characters, so it does not match.

String Creation1. Create a string of a specific length (generate a string of x’s of length 513).

Page 32: Unix Awk Sed

awk 'BEGIN { while (a++<513) s=s "x"; print s }'

This one-liner uses the “BEGIN { }” special block that gets executed before anything else in an Awk program. In this block a while loop appends character “x” to variable “s” 513 times. After it has looped, the “s” variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

This one-liner printed the 513 x’s out, but you could have used it for anything you wish in BEGIN, main program or END blocks.

Unfortunately this is not the most effective way to do it. It’s a linear time solution. My friend waldner (who, by the way, wrote a guest post on 10 Awk Tips, Tricks and Pitfalls) showed me a solution that’s logarithmic time (based on idea of recursive squaring):function rep(str, num, remain, result) { if (num < 2) { remain = (num == 1) } else { remain = (num % 2 == 1) result = rep(str, (num - remain) / 2) } return result result (remain ? str : "")}

This function can be used as following:awk 'BEGIN { s = rep("x", 513) }'

2. Insert a string of specific length at a certain character position (insert 49 x’s after 6th char).gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'

This one-liner works only with Gnu Awk, because it uses the interval expression “.{6}” in the Awk program’s body. Interval expressions were not traditionally available in awk, that’s why you have to use “--re-interval” option to enable them.

For those that do not know what interval expressions are, they are regular expressions that match a certain number of characters. For example, “.{6}” matches any six characters (the any char is specified by the dot “.”). An interval expression “b{2,4}” matches at least two, but not more than four “b” characters. To match words, you have to give them higher precedence - “(foo){4}” matches “foo” repeated four times - “foofoofoofoo”.

The one-liner starts the same way as the previous - it creates a 49 character string “s” in the BEGIN block. Next, for each line of the input, it calls sub() function that replaces the first 6 characters with themselves and “s” appended. The “&” in the sub() function means the matched part of regular expression. The ‘”&” s’ means matched part of regex and contents of variable “s”. The “1″ at the end of whole Awk one-liner prints out the modified line (it’s syntactic sugar for just “print” (that itself is syntactic sugar for “print $0″)).

The same can be achieved with normal standard Awk:awk 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^....../,"&" s) }; 1

Here we just match six chars “......” at the beginning of line, and replace them with themselves + contents of variable “s”.

It may get troublesome to insert a string at 29th position for example… You’d have to go tapping “.”

Page 33: Unix Awk Sed

twenty-nine times “.............................”. Better use Gnu Awk then and write “.{29}”.

Once again, my friend waldner corrected me and pointed to Awk Feature Comparsion chart. The chart suggests that the original one-liner with “.{6}” would also work with POSIX awk, Busybox awk, and Solaris awk.

Array Creation3. Create an array from string.split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

This is not a one-liner per se but a technique to create an array from a string. The split(Str, Arr, Regex) function is used do that. It splits string Str into fields by regular expression Regex and puts the fields in array Arr. The fields are placed in Arr[1], Arr[2], …, Arr[N]. The split() function itself returns the number of fields the string was split into.

In this piece of code the Regex is simply space character ” “, the array is month and string is “Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec”. After the split, month[1] is “Jan”, month[2] is “Feb”, …, month[12] is “Dec”.

4. Create an array named “mdigit”, indexed by strings.for (i=1; i<=12; i++) mdigit[month[i]] = i

This is another array creation technique and not a real one-liner. This technique creates a reverse lookup array. Remember from the previous “one-liner” that month[1] was “Jan”, …, month[12] was “Dec”. Now we want to the reverse lookup and find the number for each month. To do that we create a reverse lookup array “mdigit”, such that mdigit[”Jan”] = 1, …, mdigit[”Dec”] = 12.

It’s really trivial, we loop over month[1], month[2], …, month[12] and set mdigit[month[i]] to i. This way mdigit[”Jan”] = 1, etc.

Selective Printing of Certain Lines5. Print all lines where 5th field is equal to “abc123″.awk '$5 == "abc123"'

This one-liner uses idiomatic Awk - if the given expression is true, Awk prints out the line. The fifth field is referenced by “$5″ and it’s checked to be equal to “abc123″. If it is, the expression is true and the line gets printed.

Unwinding this idiom, this one-liner is really equal to:awk '{ if ($5 == "abc123") { print $0 } }'

6. Print any line where field #5 is not equal to “abc123″.awk '$5 != "abc123"'

This is exactly the same as previous one-liner, except it negates the comparison. If the fifth field “$5″ is not equal to “abc123″, then print it.

Unwinding it, it’s equal to:awk '{ if ($5 != "abc123") { print $0 } }'

Page 34: Unix Awk Sed

Another way is to literally negate the whole previous one-liner:awk '!($5 == "abc123")'

7. Print all lines whose 7th field matches a regular expression.awk '$7 ~ /^[a-f]/'

This is also idiomatic Awk. It uses “~” operator to test if the seventh “$7″ field matches a regular expression “^[a-f]”. This regular expression means “all lines that start with a lower-case letter a, b, c, d, e, or f”.awk '$7 !~ /^[a-f]/'

This one-liner matches negates the previous one and prints all lines that do not start with a lower-case letter a, b, c, d, e, and f.

Another way to write the same is:awk '$7 ~ /^[^a-f]/'