Table of Contents
Commands that are created to be used with a pipe are often called filters. These filters are very small programs that do one specific thing very efficiently. They can be used as building blocks.
This chapter will introduce you to the most common filters. The combination of simple commands and filters in a long pipe allows you to design elegant solutions.
When between two pipes, the cat command does nothing (except putting stdin on stdout).
[paul@RHEL4b pipes]$ tac count.txt | cat | cat | cat | cat | cat five four three two one [paul@RHEL4b pipes]$
Writing long pipes in Unix is fun, but sometimes you may want intermediate results. This is were tee comes in handy. The tee filter puts stdin on stdout and also into a file. So tee is almost the same as cat, except that it has two identical outputs.
[paul@RHEL4b pipes]$ tac count.txt | tee temp.txt | tac one two three four five [paul@RHEL4b pipes]$ cat temp.txt five four three two one [paul@RHEL4b pipes]$
The grep filter is famous among Unix users. The most common use of grep is to filter lines of text containing (or not containing) a certain string.
[paul@RHEL4b pipes]$ cat tennis.txt Amelie Mauresmo, Fra Kim Clijsters, BEL Justine Henin, Bel Serena Williams, usa Venus Williams, USA [paul@RHEL4b pipes]$ cat tennis.txt | grep Williams Serena Williams, usa Venus Williams, USA
You can write this without the cat.
[paul@RHEL4b pipes]$ grep Williams tennis.txt Serena Williams, usa Venus Williams, USA
One of the most useful options of grep is grep -i which filters in a case insensitive way.
[paul@RHEL4b pipes]$ grep Bel tennis.txt Justine Henin, Bel [paul@RHEL4b pipes]$ grep -i Bel tennis.txt Kim Clijsters, BEL Justine Henin, Bel [paul@RHEL4b pipes]$
Another very useful option is grep -v which outputs lines not matching the string.
[paul@RHEL4b pipes]$ grep -v Fra tennis.txt Kim Clijsters, BEL Justine Henin, Bel Serena Williams, usa Venus Williams, USA [paul@RHEL4b pipes]$
And of course, both options can be combined to filter all lines not containing a case insensitive string.
[paul@RHEL4b pipes]$ grep -vi usa tennis.txt Amelie Mauresmo, Fra Kim Clijsters, BEL Justine Henin, Bel [paul@RHEL4b pipes]$
With grep -A1 one line after the result is also displayed.
paul@debian5:~/pipes$ grep -A1 Henin tennis.txt Justine Henin, Bel Serena Williams, usa
With grep -B1 one line before the result is also displayed.
paul@debian5:~/pipes$ grep -B1 Henin tennis.txt Kim Clijsters, BEL Justine Henin, Bel
With grep -C1 (context) one line before and one after are also displayed. All three options (A,B, and C) can display any number of lines (using e.g. A2, B4 or C20).
paul@debian5:~/pipes$ grep -C1 Henin tennis.txt Kim Clijsters, BEL Justine Henin, Bel Serena Williams, usa
The cut filter can select columns from files, depending on a delimiter or a count of bytes. The screenshot below uses cut to filter for the username and userid in the /etc/passwd file. It uses the colon as a delimiter, and selects fields 1 and 3.
[[paul@RHEL4b pipes]$ cut -d: -f1,3 /etc/passwd | tail -4 Figo:510 Pfaff:511 Harry:516 Hermione:517 [paul@RHEL4b pipes]$
When using a space as the delimiter for cut, you have to quote the space.
[paul@RHEL4b pipes]$ cut -d" " -f1 tennis.txt Amelie Kim Justine Serena Venus [paul@RHEL4b pipes]$
This example uses cut to display the second to the seventh character of /etc/passwd.
[paul@RHEL4b pipes]$ cut -c2-7 /etc/passwd | tail -4 igo:x: faff:x arry:x ermion [paul@RHEL4b pipes]$
You can translate characters with tr. The screenshot shows the translation of all occurrences of e to E.
[paul@RHEL4b pipes]$ cat tennis.txt | tr 'e' 'E' AmEliE MaurEsmo, Fra Kim ClijstErs, BEL JustinE HEnin, BEl SErEna Williams, usa VEnus Williams, USA
Here we set all letters to uppercase by defining two ranges.
[paul@RHEL4b pipes]$ cat tennis.txt | tr 'a-z' 'A-Z' AMELIE MAURESMO, FRA KIM CLIJSTERS, BEL JUSTINE HENIN, BEL SERENA WILLIAMS, USA VENUS WILLIAMS, USA [paul@RHEL4b pipes]$
Here we translate all newlines to spaces.
[paul@RHEL4b pipes]$ cat count.txt one two three four five [paul@RHEL4b pipes]$ cat count.txt | tr '\n' ' ' one two three four five [paul@RHEL4b pipes]$
The tr -s filter can also be used to squeeze multiple occurrences of a character to one.
[paul@RHEL4b pipes]$ cat spaces.txt one two three four five six [paul@RHEL4b pipes]$ cat spaces.txt | tr -s ' ' one two three four five six [paul@RHEL4b pipes]$
You can also use tr to 'encrypt' texts with rot13.
[paul@RHEL4b pipes]$ cat count.txt | tr 'a-z' 'nopqrstuvwxyzabcdefghijklm' bar gjb guerr sbhe svir [paul@RHEL4b pipes]$ cat count.txt | tr 'a-z' 'n-za-m' bar gjb guerr sbhe svir [paul@RHEL4b pipes]$
This last example uses tr -d to delete characters.
paul@debian5:~/pipes$ cat tennis.txt | tr -d e Amli Maursmo, Fra Kim Clijstrs, BEL Justin Hnin, Bl Srna Williams, usa Vnus Williams, USA
Counting words, lines and characters is easy with wc.
[paul@RHEL4b pipes]$ wc tennis.txt 5 15 100 tennis.txt [paul@RHEL4b pipes]$ wc -l tennis.txt 5 tennis.txt [paul@RHEL4b pipes]$ wc -w tennis.txt 15 tennis.txt [paul@RHEL4b pipes]$ wc -c tennis.txt 100 tennis.txt [paul@RHEL4b pipes]$
The sort filter will default to an alphabetical sort.
paul@debian5:~/pipes$ cat music.txt Queen Brel Led Zeppelin Abba paul@debian5:~/pipes$ sort music.txt Abba Brel Led Zeppelin Queen
But the sort filter has many options to tweak its usage. This example shows sorting different columns (column 1 or column 2).
[paul@RHEL4b pipes]$ sort -k1 country.txt Belgium, Brussels, 10 France, Paris, 60 Germany, Berlin, 100 Iran, Teheran, 70 Italy, Rome, 50 [paul@RHEL4b pipes]$ sort -k2 country.txt Germany, Berlin, 100 Belgium, Brussels, 10 France, Paris, 60 Italy, Rome, 50 Iran, Teheran, 70
The screenshot below shows the difference between an alphabetical sort and a numerical sort (both on the third column).
[paul@RHEL4b pipes]$ sort -k3 country.txt Belgium, Brussels, 10 Germany, Berlin, 100 Italy, Rome, 50 France, Paris, 60 Iran, Teheran, 70 [paul@RHEL4b pipes]$ sort -n -k3 country.txt Belgium, Brussels, 10 Italy, Rome, 50 France, Paris, 60 Iran, Teheran, 70 Germany, Berlin, 100
With uniq you can remove duplicates from a sorted list.
paul@debian5:~/pipes$ cat music.txt Queen Brel Queen Abba paul@debian5:~/pipes$ sort music.txt Abba Brel Queen Queen paul@debian5:~/pipes$ sort music.txt |uniq Abba Brel Queen
uniq can also count occurrences with the -c option.
paul@debian5:~/pipes$ sort music.txt |uniq -c 1 Abba 1 Brel 2 Queen
Comparing streams (or files) can be done with the comm. By default comm will output three columns. In this example, Abba, Cure and Queen are in both lists, Bowie and Sweet are only in the first file, Turner is only in the second.
paul@debian5:~/pipes$ cat > list1.txt Abba Bowie Cure Queen Sweet paul@debian5:~/pipes$ cat > list2.txt Abba Cure Queen Turner paul@debian5:~/pipes$ comm list1.txt list2.txt Abba Bowie Cure Queen Sweet Turner
The output of comm can be easier to read when outputting only a single column. The digits point out which output columns should not be displayed.
paul@debian5:~/pipes$ comm -12 list1.txt list2.txt Abba Cure Queen paul@debian5:~/pipes$ comm -13 list1.txt list2.txt Turner paul@debian5:~/pipes$ comm -23 list1.txt list2.txt Bowie Sweet
European humans like to work with ascii characters, but computers store files in bytes. The example below creates a simple file, and then uses od to show the contents of the file in hexadecimal bytes
paul@laika:~/test$ cat > text.txt abcdefg 1234567 paul@laika:~/test$ od -t x1 text.txt 0000000 61 62 63 64 65 66 67 0a 31 32 33 34 35 36 37 0a 0000020
The same file can also be displayed in octal bytes.
paul@laika:~/test$ od -b text.txt 0000000 141 142 143 144 145 146 147 012 061 062 063 064 065 066 067 012 0000020
And here is the file in ascii (or backslashed) characters.
paul@laika:~/test$ od -c text.txt 0000000 a b c d e f g \n 1 2 3 4 5 6 7 \n 0000020
The stream editor sed can perform editing functions in the stream, using regular expressions.
paul@debian5:~/pipes$ echo level5 | sed 's/5/42/' level42 paul@debian5:~/pipes$ echo level5 | sed 's/level/jump/' jump5
Add g for global replacements (all occurrences of the string per line).
paul@debian5:~/pipes$ echo level5 level7 | sed 's/level/jump/' jump5 level7 paul@debian5:~/pipes$ echo level5 level7 | sed 's/level/jump/g' jump5 jump7
With d you can remove lines from a stream containing a character.
paul@debian5:~/test42$ cat tennis.txt Venus Williams, USA Martina Hingis, SUI Justine Henin, BE Serena williams, USA Kim Clijsters, BE Yanina Wickmayer, BE paul@debian5:~/test42$ cat tennis.txt | sed '/BE/d' Venus Williams, USA Martina Hingis, SUI Serena williams, USA
How many users are logged on to this system ?
[paul@RHEL4b pipes]$ who root tty1 Jul 25 10:50 paul pts/0 Jul 25 09:29 (laika) Harry pts/1 Jul 25 12:26 (barry) paul pts/2 Jul 25 12:26 (pasha) [paul@RHEL4b pipes]$ who | wc -l 4
Display a sorted list of logged on users.
[paul@RHEL4b pipes]$ who | cut -d' ' -f1 | sort Harry paul paul root
Display a sorted list of logged on users, but every user only once .
[paul@RHEL4b pipes]$ who | cut -d' ' -f1 | sort | uniq Harry paul root
Display a list of all bash user accounts on this computer. Users accounts are explained in detail later.
paul@debian5:~$ grep bash /etc/passwd root:x:0:0:root:/root:/bin/bash paul:x:1000:1000:paul,,,:/data/sites/web/cobbautbe/subsites:/bin/bash serena:x:1001:1001::/home/serena:/bin/bash paul@debian5:~$ grep bash /etc/passwd | cut -d: -f1 root paul serena
1. Put a sorted list of all bash users in bashusers.txt.
2. Put a sorted list of all logged on users in onlineusers.txt.
3. Make a list of all filenames in /etc that contain the string conf in their filename.
4. Make a sorted list of all files in /etc that contain the case insensitive string conf in their filename.
5. Look at the output of /sbin/ifconfig. Write a line that displays only ip address and the subnet mask.
6. Write a line that removes all non-letters from a stream.
7. Write a line that receives a text file, and outputs all words on a separate line.
8. Write a spell checker on the command line. (There may be a dictionary in /usr/share/dict/ .)
1. Put a sorted list of all bash users in bashusers.txt.
grep bash /etc/passwd | cut -d: -f1 | sort > bashusers.txt
2. Put a sorted list of all logged on users in onlineusers.txt.
who | cut -d' ' -f1 | sort > onlineusers.txt
3. Make a list of all filenames in /etc that contain the string conf in their filename.
ls /etc | grep conf
4. Make a sorted list of all files in /etc that contain the case insensitive string conf in their filename.
ls /etc | grep -i conf | sort
5. Look at the output of /sbin/ifconfig. Write a line that displays only ip address and the subnet mask.
/sbin/ifconfig | head -2 | grep 'inet ' | tr -s ' ' | cut -d' ' -f3,5
6. Write a line that removes all non-letters from a stream.
paul@deb503:~$ cat text This is, yes really! , a text with ?&* too many str$ange# characters ;-) paul@deb503:~$ cat text | tr -d ',!$?.*&^%#@;()-' This is yes really a text with too many strange characters
7. Write a line that receives a text file, and outputs all words on a separate line.
paul@deb503:~$ cat text2 it is very cold today without the sun paul@deb503:~$ cat text2 | tr ' ' '\n' it is very cold today without the sun
8. Write a spell checker on the command line. (There may be a dictionary in /usr/share/dict/ .)
paul@rhel ~$ echo "The zun is shining today" > text paul@rhel ~$ cat > DICT is shining sun the today paul@rhel ~$ cat text | tr 'A-Z ' 'a-z\n' | sort | uniq | comm -23 - DICT zun
You could also add the solution from question number 6 to remove non-letters, and tr -s ' ' to remove redundant spaces.