Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes

Unix Filters

Text processing utilities

Filters

• Filter commands – Unix commands that serve dual purposes:– standalone– used with other commands and pipes

• Why are filters important? – The output from a command may be overwhelming, e.g. recall

the output from the:

# command to list all of the files in the /home directory

ls –l /home

filters may be a way of reducing output to only the pertinent data

– We may wish to organize the data presented.– We may wish to reformat the data presented.

Filter Commands• Filter commands include:

– head grep– tail sed– More awk– sort – uniq – wc – diff– cut – paste– Cmp– comm– tr– cat

head • head outputs the first part of files

• Format: head [options] filename

• Selected options:

-n : print the first n lines the default is 10

head –5 abc displays first 5 lines of abc file

head abc displays first 10 lines of abc file

tail

• tail outputs the last part of files• Format: tail [options] filename• Selected options:

-n : print the last n lines the default is 10

• head abc | tail -5 prints lines 6 through 10 of file abc

more• more displays files on a page-by-page basis

• Format: more [options] filename

• Options (selected):

– -n : number of lines to show at a time

– -d :displays errors

– +/pattern : begin at pattern

• Commands to use while in more:

– space : advance a page at a time

– return : advance a line at a time

– q : quit more

more Examples

more abc lists file abc one page/line at a time

ls –l | more lists directory contents one page/line at a time

sort • sort sorts lines of text [files]

• usually (but not always) used with redirection

• Format: sort [options] inputfiles


– -b : ignore leading blanks

– -f : ignore case

– -r : sort descending rather than ascending

– -M : sort using month order (i.e. Jan comes before

Feb)

– -n : sort numerical rather than by string

– -u : unique, eliminate duplicate lines

– -tchar : use char as field separator (instead of

whitespace)

sort

• We can also sort the files by specifying fields.

• When using fields to sort, you may also use characters within a field.

• Field specifier has following format: +number1 –number2

• Examples:

– sort +0 -1 file1 (uses fisrt fieldto sort the file)

– sort –n +4 -5 file1 (sorts starting with fifth field, values in numerical order)

– sort –t, +1 -2 file1 (uses , as delimiter instead of white space)

Sort Fieldsfield1 field2 field3 field4 field5 field6 field7

0102 Smith, Bob May 12, 1992 2

3055 Ye, Chan April 1, 1987 8

2337 McFadden, Mabel December 12, 1991 21

8441 Gupta, Soumyaroop May 1, 1992 19

1198 Crockett, Bob June 5, 1989 1

sort• When sorting on multiple fields, you may also use the –k

pos1[,pos2] options.

• If you wish to indicate the format of the key to be sorted, place the format value adjacent to the field number.

• Examples:

Command Results

sort –t” “ –k 2n –k 5M –k 6 filename Sorts filename, using a space delimiter, 2nd field using numeric ordering, 5th field based on month names and 6th field

sort –t”,” –k 3,4 filename Sorts filename on 3rd and 4th comma separated fields

uniq

• uniq removes duplicate adjacent lines from a file

• To ensure that all of the data is unique you should sort the file first (recall that sort also can produce unique values with the –u option)

• Format: uniq [options] filename


– -c : count the instances

– -d : print only the duplicates (once)

– -u : print only the unique lines

uniq Examples

uniq –u abc only prints out the unique lines in file abc assuming duplicates are next to each other (without the –u, uniq will print out all of the lines, but only once, the 2nd or more duplicate lines will not print out)

uniq –c abc only prints out the count of unique lines assuming duplicates are next to each other

sort abc | uniq –c put abc in order and tell me how many lines are unique

wc

• wc prints the number of newlines, words, and characters (bytes) in files

• Format: wc [options] filename

• Options:

– -c : print the number of characters in the file

– -w : print the number of words in a file

– -l : print the number of lines in a file

• If no options given, wc prints newlines, words, and bytes in files in that order

cut • cut removes columns or fields from a line• Format: cut [options] filename• Options:

*lists may be specified using integers separated by commas, ranges are separated by hyphens (-) Examples: 1,5-7 means units 1,5,6,7 and 1,5- means units 1,5 and beyond to end of line

-c list* : the list of columns to cut

-f list * : the list of fields to cut

-dchar : the delimiter for fields. Only one delimiter may be specified (tab is the default)

-s : suppress lines without delimiters

cut

• Examples:cut -d: -f5 /etc/passwd (print only the 5th field,

delimited by colon(:), of the /etc/passwd file)

cut –d” “ –f2 shuffled (print only the 2nd field, delimited by a space, from file shuffled)

cut –c4-8 shuffled (print only the columns 4 through 8 of each line in shuffled)

paste

• paste command merge the lines of one or more files into vertical columns separated by a tab.

paste testfile testfile2 > outputfile

this is firstline this is testfile2

Diff , cmp and comm

• Diff command. diff command will compare the two files line by line and print out the differences between.

• Syntax : diff [option] files

Options are: -b

-w

-i

• cmp command compares the two files byte by byte with two options : -l , -s

• Comm command finds lines that are identical in two files

• tr command manipulates individual characters in a character stream.

tr [options] string1 string2

• When executed, the program reads from the standard input and writes to the standard output.

• It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set (string 1)with the corresponding elements from the other set (string 2).

19L6.2

tr: translating characters

Examples:

$ tr "aeiou" “AEIOU" < computer.txt

$ tr –d ‘|/’ <shortlist | head -3

$ tr ‘|’ ‘\012’ <shortlist | head -6

$ tr ‘|/’ ‘~-’ < shortlist |head -3

20L6.3

Documents

Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes