Skip to content

04 pipes and filters

Carsten Fortmann-Grote edited this page Mar 15, 2024 · 1 revision

Pipes and Filters

Questions

  • How can I combine existing commands to do new things?

Objectives

  • Redirect a command’s output to a file.
  • Construct command pipelines with two or more stages.
  • Explain what usually happens if a program or pipeline isn’t given any input to process.
  • Explain the advantage of linking commands with pipes and filters.

Reminder: What the shell is good for

  • repetitive tasks
  • do stuff on a remote computer (e.g. compute cluster)
  • combine tools into pipelines
  • automate tasks
  • keep work reproducible
  • Reduces risk for repetitive strain injury

Analyse some data

  • navigate to shell-lesson-data/exercise-data/alkanes/
    $ cd ~/Desktop/shell-lesson-data/exercise-data/alkanes
    $ ls
        

Counting lines, words, and characters: wc

Example

$ wc cubane.pdb

Count in all files

$ wc *.pdb

Exercise

Task

Count only the lines/words/characters in pentane.pdb.

Solution

Capturing output

Nelle wants to know how many lines are in each pdb file and save that information in a new text file.

$ wc *.pdb > lengths.txt

Printing a text file in the shell

$ cat lengths.txt

cat prints the entire content of a file in one go. For large files, you will see the top parts rush over your screen and only be able to read the last bit. An alternative pager is more (or less).

To quit less, type |q|.

Sorting output

Exercise

Tasks

  1. In shell-lesson-data/exercise-data/, what is the content of numbers.txt?
  2. Run the command
    $ sort numbers.txt
        

What does sort do?

  1. What does, in contrast,
    $ sort -n numbers.txt
        

    do?

Sorting the lengths of alkane files

$ cd alkanes
$ sort -n lengths
$ sort -n lengths > sorted-lengths.txt

Display the top n lines of a file

$ head -n 1 sorted-lengths.txt

Exercise

Task

  • How do you display the n last lines of a file?

Solution

Task

Run

$ echo "Hello" > greeting.txt

Followed by (mind the >>)

$ echo "Hello World" >> greeting.txt

What’s in greetings.txt ? now.

Append a third line to the file.

What happens if you now

$ echo "Uups" > greeting.txt

Task

What’s the content of animals-subsets.csv after running these two commands?

$ head -n 3 animals.csv > animals-subsets.csv
$ tail -n 2 animals.csv >> animals-subsets.csv
  1. The first three lines of animals.csv
  2. The last two lines of animals.csv
  3. The first three lines and the last two lines of animals.csv
  4. The second and third lines of animals.csv

Solution

Explain why the other solutions are incorrect.

Solution

Running wc without argument

$ wc -l

The command is waiting for input on the terminal. Type |ctrl-c| to exit this state.

This feature is useful to pipe output of one command as input to another command.

Passing output to another command |

Finding the smallest number in an unsorted file

$ sort -n lengths.txt | head -n 1
  • No need to save intermediate steps.

Longer pipes

$ wc -l *.pdb | sort -n

or

$ wc -l *.pdb | sort -n | head -1

file:img/Longer_pipes/2024-03-15_09-57-19_redirects-and-pipes.svg

Exercise

Task

In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?

  1. wc -l * > sort -n > head -n 3
  2. wc -l * | sort -n | head -n 1-3
  3. wc -l * | head -n 3 | sort -n
  4. wc -l * | sort -n | head -n 3

Solution

Task

A file called animals.csv (in the shell-lesson-data/exercise-data/animal-counts directory) contains the following data:

2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-06,fox,4
2012-11-07,rabbit,16
2012-11-07,bear,1

What text passes through each of the pipes and the final redirect in the pipeline below? Note, the sort -r command sorts in reverse order.

$ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt

Hint: build the pipeline up one command at a time to test your understanding

Solution

Task

For the file animals.csv from the previous exercise, consider the following command:

$ cut -d , -f 2 animals.csv

The cut command is used to remove or ‘cut out’ certain sections of each line in the file, and cut expects the lines to be separated into columns by a Tab character. A character used in this way is called a delimiter. In the example above we use the -d option to specify the comma as our delimiter character. We have also used the -f option to specify that we want to extract the second field (column). This gives the following output:

deer
rabbit
raccoon
rabbit
deer
fox
rabbit
bear

The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)?

Solution

Task

The file animals.csv contains 8 lines of data formatted as follows:

2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
...

The uniq command has a -c option which gives a count of the number of times a line occurs in its input. Assuming your current directory is shell-lesson-data/exercise-data/animal-counts, what command would you use to produce a table that shows the total count of each type of animal in the file?

1. $ sort animals.csv | uniq -c
2. $ sort -t, -k2,2 animals.csv | uniq -c
3. $ cut -d, -f 2 animals.csv | uniq -c
4. $ cut -d, -f 2 animals.csv | sort | uniq -c
5. $ cut -d, -f 2 animals.csv | sort | uniq -c | wc -l

Solution

Take home

  • wc counts lines, words, and characters in its inputs.
  • cat displays the contents of its inputs.
  • sort sorts its inputs.
  • head displays the first 10 lines of its input.
  • tail displays the last 10 lines of its input.
  • command > [file] redirects a command’s output to a file (overwriting any existing content).
  • command >> [file] appends a command’s output to a file.
  • [first] | [second] is a pipeline: the output of the first command is used as the input to the second.
  • The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).