Upload
lester-webb
View
230
Download
0
Embed Size (px)
Citation preview
Unix Filters
Text processing utilities
Filters
• Filter commands – Unix commands that serve dual purposes:– standalone– used with other commands and pipes
• Why are filters important? – The output from a command may be overwhelming, e.g. recall
the output from the:
# command to list all of the files in the /home directory
ls –l /home
filters may be a way of reducing output to only the pertinent data
– We may wish to organize the data presented.– We may wish to reformat the data presented.
Filter Commands• Filter commands include:
– head grep– tail sed– More awk– sort – uniq – wc – diff– cut – paste– Cmp– comm– tr– cat
head • head outputs the first part of files
• Format: head [options] filename
• Selected options:
-n : print the first n lines the default is 10
head –5 abc displays first 5 lines of abc file
head abc displays first 10 lines of abc file
tail
• tail outputs the last part of files• Format: tail [options] filename• Selected options:
-n : print the last n lines the default is 10
• head abc | tail -5 prints lines 6 through 10 of file abc
more• more displays files on a page-by-page basis
• Format: more [options] filename
• Options (selected):
– -n : number of lines to show at a time
– -d :displays errors
– +/pattern : begin at pattern
• Commands to use while in more:
– space : advance a page at a time
– return : advance a line at a time
– q : quit more
more Examples
more abc lists file abc one page/line at a time
ls –l | more lists directory contents one page/line at a time
sort • sort sorts lines of text [files]
• usually (but not always) used with redirection
• Format: sort [options] inputfiles
• Options (selected):
– -b : ignore leading blanks
– -f : ignore case
– -r : sort descending rather than ascending
– -M : sort using month order (i.e. Jan comes before
Feb)
– -n : sort numerical rather than by string
– -u : unique, eliminate duplicate lines
– -tchar : use char as field separator (instead of
whitespace)
sort
• We can also sort the files by specifying fields.
• When using fields to sort, you may also use characters within a field.
• Field specifier has following format: +number1 –number2
• Examples:
– sort +0 -1 file1 (uses fisrt fieldto sort the file)
– sort –n +4 -5 file1 (sorts starting with fifth field, values in numerical order)
– sort –t, +1 -2 file1 (uses , as delimiter instead of white space)
Sort Fieldsfield1 field2 field3 field4 field5 field6 field7
0102 Smith, Bob May 12, 1992 2
3055 Ye, Chan April 1, 1987 8
2337 McFadden, Mabel December 12, 1991 21
8441 Gupta, Soumyaroop May 1, 1992 19
1198 Crockett, Bob June 5, 1989 1
sort• When sorting on multiple fields, you may also use the –k
pos1[,pos2] options.
• If you wish to indicate the format of the key to be sorted, place the format value adjacent to the field number.
• Examples:
Command Results
sort –t” “ –k 2n –k 5M –k 6 filename Sorts filename, using a space delimiter, 2nd field using numeric ordering, 5th field based on month names and 6th field
sort –t”,” –k 3,4 filename Sorts filename on 3rd and 4th comma separated fields
uniq
• uniq removes duplicate adjacent lines from a file
• To ensure that all of the data is unique you should sort the file first (recall that sort also can produce unique values with the –u option)
• Format: uniq [options] filename
• Options (selected):
– -c : count the instances
– -d : print only the duplicates (once)
– -u : print only the unique lines
uniq Examples
uniq –u abc only prints out the unique lines in file abc assuming duplicates are next to each other (without the –u, uniq will print out all of the lines, but only once, the 2nd or more duplicate lines will not print out)
uniq –c abc only prints out the count of unique lines assuming duplicates are next to each other
sort abc | uniq –c put abc in order and tell me how many lines are unique
wc
• wc prints the number of newlines, words, and characters (bytes) in files
• Format: wc [options] filename
• Options:
– -c : print the number of characters in the file
– -w : print the number of words in a file
– -l : print the number of lines in a file
• If no options given, wc prints newlines, words, and bytes in files in that order
cut • cut removes columns or fields from a line• Format: cut [options] filename• Options:
*lists may be specified using integers separated by commas, ranges are separated by hyphens (-) Examples: 1,5-7 means units 1,5,6,7 and 1,5- means units 1,5 and beyond to end of line
-c list* : the list of columns to cut
-f list * : the list of fields to cut
-dchar : the delimiter for fields. Only one delimiter may be specified (tab is the default)
-s : suppress lines without delimiters
cut
• Examples:cut -d: -f5 /etc/passwd (print only the 5th field,
delimited by colon(:), of the /etc/passwd file)
cut –d” “ –f2 shuffled (print only the 2nd field, delimited by a space, from file shuffled)
cut –c4-8 shuffled (print only the columns 4 through 8 of each line in shuffled)
paste
• paste command merge the lines of one or more files into vertical columns separated by a tab.
paste testfile testfile2 > outputfile
this is firstline this is testfile2
Diff , cmp and comm
• Diff command. diff command will compare the two files line by line and print out the differences between.
• Syntax : diff [option] files
Options are: -b
-w
-i
• cmp command compares the two files byte by byte with two options : -l , -s
• Comm command finds lines that are identical in two files
• tr command manipulates individual characters in a character stream.
tr [options] string1 string2
• When executed, the program reads from the standard input and writes to the standard output.
• It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set (string 1)with the corresponding elements from the other set (string 2).
19L6.2
tr: translating characters
Examples:
$ tr "aeiou" “AEIOU" < computer.txt
$ tr –d ‘|/’ <shortlist | head -3
$ tr ‘|’ ‘\012’ <shortlist | head -6
$ tr ‘|/’ ‘~-’ < shortlist |head -3
20L6.3