-
Notifications
You must be signed in to change notification settings - Fork 1
04 pipes and filters
- How can I combine existing commands to do new things?
- Redirect a command’s output to a file.
- Construct command pipelines with two or more stages.
- Explain what usually happens if a program or pipeline isn’t given any input to process.
- Explain the advantage of linking commands with pipes and filters.
- repetitive tasks
- do stuff on a remote computer (e.g. compute cluster)
- combine tools into pipelines
- automate tasks
- keep work reproducible
- Reduces risk for repetitive strain injury
- navigate to
shell-lesson-data/exercise-data/alkanes/
$ cd ~/Desktop/shell-lesson-data/exercise-data/alkanes $ ls
$ wc cubane.pdb
$ wc *.pdb
Count only the lines/words/characters in pentane.pdb
.
Nelle wants to know how many lines are in each pdb
file and save that information in a new text file.
$ wc *.pdb > lengths.txt
$ cat lengths.txt
cat
prints the entire content of a file in one go. For large files, you will see the top parts rush over your screen and only be able to read the last bit.
An alternative pager is more
(or less
).
To quit less
, type |q|.
- In
shell-lesson-data/exercise-data/
, what is the content ofnumbers.txt
? - Run the command
$ sort numbers.txt
What does sort
do?
- What does, in contrast,
$ sort -n numbers.txt
do?
$ cd alkanes $ sort -n lengths $ sort -n lengths > sorted-lengths.txt
$ head -n 1 sorted-lengths.txt
- How do you display the
n
last lines of a file?
Run
$ echo "Hello" > greeting.txt
Followed by (mind the >>
)
$ echo "Hello World" >> greeting.txt
What’s in greetings.txt
? now.
Append a third line to the file.
What happens if you now
$ echo "Uups" > greeting.txt
What’s the content of animals-subsets.csv
after running these two commands?
$ head -n 3 animals.csv > animals-subsets.csv $ tail -n 2 animals.csv >> animals-subsets.csv
- The first three lines of animals.csv
- The last two lines of animals.csv
- The first three lines and the last two lines of animals.csv
- The second and third lines of animals.csv
Explain why the other solutions are incorrect.
$ wc -l
The command is waiting for input on the terminal. Type |ctrl-c| to exit this state.
This feature is useful to pipe output of one command as input to another command.
$ sort -n lengths.txt | head -n 1
- No need to save intermediate steps.
$ wc -l *.pdb | sort -n
or
$ wc -l *.pdb | sort -n | head -1
file:img/Longer_pipes/2024-03-15_09-57-19_redirects-and-pipes.svg
In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?
wc -l * > sort -n > head -n 3
wc -l * | sort -n | head -n 1-3
wc -l * | head -n 3 | sort -n
wc -l * | sort -n | head -n 3
A file called animals.csv (in the shell-lesson-data/exercise-data/animal-counts
directory) contains the following data:
2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-06,fox,4 2012-11-07,rabbit,16 2012-11-07,bear,1
What text passes through each of the pipes and the final redirect in the
pipeline below? Note, the sort -r
command sorts in reverse order.
$ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt
Hint: build the pipeline up one command at a time to test your understanding
For the file animals.csv from the previous exercise, consider the following command:
$ cut -d , -f 2 animals.csv
The cut
command is used to remove or ‘cut out’ certain sections of each line in
the file, and cut expects the lines to be separated into columns by a Tab
character. A character used in this way is called a delimiter. In the example
above we use the -d option to specify the comma as our delimiter character. We
have also used the -f option to specify that we want to extract the second field
(column). This gives the following output:
deer rabbit raccoon rabbit deer fox rabbit bear
The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)?
The file animals.csv contains 8 lines of data formatted as follows:
2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 ...
The uniq command has a -c option which gives a count of the number of times a line occurs in its input. Assuming your current directory is shell-lesson-data/exercise-data/animal-counts, what command would you use to produce a table that shows the total count of each type of animal in the file?
1. $ sort animals.csv | uniq -c 2. $ sort -t, -k2,2 animals.csv | uniq -c 3. $ cut -d, -f 2 animals.csv | uniq -c 4. $ cut -d, -f 2 animals.csv | sort | uniq -c 5. $ cut -d, -f 2 animals.csv | sort | uniq -c | wc -l
-
wc
counts lines, words, and characters in its inputs. -
cat
displays the contents of its inputs. -
sort
sorts its inputs. -
head
displays the first 10 lines of its input. -
tail
displays the last 10 lines of its input. -
command > [file]
redirects a command’s output to a file (overwriting any existing content). -
command >> [file]
appends a command’s output to a file. -
[first] | [second]
is a pipeline: the output of the first command is used as the input to the second. - The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).