Pearson IT Certification

Process Text Streams Using Filters

By

Date: Oct 5, 2017

Return to the article

This sample chapter from CompTIA Linux+/LPIC-1 Portable Command Guide provides information and commands on a variety of topics including cat, cut, expand, fmt, head, join, less, nl, od, paste, and more.

This chapter provides information and commands concerning the following topics:

cat

The cat command displays the contents of text files. Important options include the following:

Option Description
-A Same as -vET.
-e Same as -vE.
-E Displays a $ character at the end of each line (used to see trailing whitespace characters).
-n Numbers all lines of output.
-s Converts multiple blank lines into a single blank line.
-T Displays “^I” characters for each tab character (used to see spaces instead of tabs).
-v Displays “unprintable” characters (such as control characters).

cut

The cut command is used to display “sections” of data. Important options include the following:

Option Description
-b Used to define a section to print by bytes.
-c Used to define a section to print by characters.
-d Used to specify a delimiter character (used with the -f option).
-f Used to specify which fields to display.

Example using fields:

[student@localhost ~]$ head -2 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
[student@localhost ~]$ head -2 /etc/passwd | cut -d: -f1,7
root:/bin/bash
bin:/sbin/nologin

Example using characters:

[student@localhost ~]$ ls -l /etc/passwd
-rw-r--r--. 1 root root 2607 Nov 3 10:15 /etc/passwd
[student@localhost ~]$ ls -l /etc/passwd | cut -c1-10,42-
-rw-r--r-- /etc/passwd

expand

The expand command converts tabs into spaces. Use the -t option to specify how many spaces to insert in place of each tab.

Example:

[student@localhost ~]$ cat -T sample.txt
Example:^IOne
Test:^I^ITwo
[student@localhost ~]$ expand -t 4 sample.txt
Example:    One
Test:       Two
[student@localhost ~]$ expand -t 8 sample.txt
Example:      One
Test:         Two

fmt

The fmt command performs simple formatting of text data. Important options include the following:

Option Description
-u Changes the document so there is only one space between each word and two spaces after each sentence.
-w Used to specify the maximum number of characters in each line.

Example:

[student@localhost ~]$ cat data.txt
pam_motd — Display the motd file

DESCRIPTION

pam_motd is a PAM module that can be used to display arbitrary motd
(message of the day) files after a successful login. By default
the /etc/motd file is shown. The message size is limited to 64KB.

[student@localhost ~]$ fmt -w 40 data.txt
pam_motd — Display the motd file

DESCRIPTION

pam_motd is a PAM module that can be
used to display arbitrary motd (message
of the day) files after a successful
login. By default the /etc/motd file
is shown. The message size is limited
to 64KB.

head

The head command displays the top part of text data. By default, the top ten lines are displayed. Use the -n option to display a different number of lines:

[student@localhost ~]$ ls -l | head -3
total 12
drwxrwxr-x. 2 student student  6 Aug 22 16:51 book
drwxrwxr-x. 2 student student  6 Aug 22 16:51 class

join

The join command merges files into a single file by using a common field. The two files must already be sorted on the common field before the join command is run. See Figure 10.1 for an example.

10fig01.jpg

Figure 10.1 Using the join Command

less

The less command is used to display large chunks of text data. Unlike the cat command, the less command will pause after displaying the first page of information. Keys on the keyboard allow the user to scroll through the document. The following table highlights the more useful movement keys:

Movement Key Description
h Displays a help screen (summary of the less command movement keys).
SPACEBAR Move forward one page in the current document.
b Move back one page in the current document.
ENTER Move down one line in the current document; the down-arrow key can also perform this operation.
UP ARROW Move up one line in the current document.
/term Search the document for term (this can be a regular expression or just plain text).
q Quit viewing the document and return to the shell.

nl

The nl command displays a file with numbered lines.

od

The od command “dumps” files into either octal format or another format. By default, it converts data into octal format:

[student@localhost ~]$ more people.txt
1 tom
2 nick
3 sue
4 tim
[student@localhost ~]$ od people.txt
0000000 020061 067564 005155 020062 064556 065543 031412 071440
0000020 062565 032012 072040 066551 000012
0000031

Important options include the following:

Option Description
-t Used to specify the output format: “d” for decimal, “f” for floating point, and “x” for hexadecimal.
-N x Limits the output to x number of bytes.

paste

The paste command is used to merge files together. See Figure 10.2 for an example.

10fig02.jpg

Figure 10.2 Using the paste Command

Use the -d option to specify the output delimiter (the default is a tab character).

pr

The pr command is used to perform changes to text before it’s sent to a printer. Important options include the following:

Option Description
-l Used to indicate how many lines appear per page of output (for example, pr -l 44 file.txt).
-t Suppresses the page header; the header includes a timestamp and the page number.
-d Double-space the output.
-o Used to specify the indent value (for example, pr -o 8 file.txt).
-w Used to specify the maximum width (number of characters) of each line.

sed

Use the sed utility to make automated modifications to files. The basic format for the sed command is sed 's/RE/string/' file.

The “RE” refers to the term regular expression, a feature that uses special characters to match patterns. See Chapter 15, “Search Text Files Using Regular Expressions,” for more details about regular expressions.

Example of the sed command:

[student@localhost ~]$ head -n 5 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
[student@localhost ~]$ head -n 5 /etc/passwd | sed 's/bin/----/'
root:x:0:0:root:/root:/----/bash
----:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/s----:/sbin/nologin
adm:x:3:4:adm:/var/adm:/s----/nologin
lp:x:4:7:lp:/var/spool/lpd:/s----/nologin

sed is a very powerful utility with a large number of features. The following table describes some of the more useful sed utilities:

Feature Description
'/RE/d' Deletes lines that match the RE from the output of the sed command.
'/RE/c\string' Changes lines that match the RE to the value of string.
'/RE/a\string' Add string on a line after all lines that match the RE.
'/RE/i\string' Add string on a line before all lines that match the RE.

The sed command has two important modifiers (characters added to the end of the sed operation):

10fig03.jpg

Figure 10.3 The g Modifier

The sed command can also change the original file (instead of displaying the modified data to the screen). To change the original file, use the -i option.

sort

The sort command can be use to sort text data. By default, it will break each line of data into fields, using whitespace as the default delimiter. It also sorts on the first field in the data by default, performing a dictionary sort:

[student@localhost ~]$ cat individuals.txt
tom
nick
sue
Tim
[student@localhost ~]$ sort individuals.txt
nick
sue
Tim
tom

Important options include the following:

Option Description
-f Fold case (essentially case-insensitive).
-h Human-based numeric sort (for example, 2K is lower than 1G).
-n Numeric sort.
-M Month-based sort.
-r Used to reverse the sort order.
-t Used to change the field separator (for example, sort -t ":" file.txt).
-u Used to remove duplicate lines.

To sort on a different field than the default, use the -k option. Here’s an example:

[student@localhost ~]$ more people.txt
1 tom
2 nick
3 sue
4 tim
[student@localhost ~]$ sort -k 2 people.txt
2 nick
3 sue
4 tim
1 tom

split

To break up a large file into a series of smaller files, use the split command. In the following example, the linux.words file is split into smaller files of 100,000 lines each. Each file is named x--, where “--” is aa, ab, ac, and so on.

[student@localhost dictionary]$ ls
linux.words
[student@localhost dictionary]$ split -l 100000 linux.words
[student@localhost dictionary]$ ls
linux.words xaa xab xac xad xae

Important options include the following:

Option Description
-b Break up files based on the number of bytes in each file (for example, split -b 5000 file.txt).
-l Break a file into smaller files based on the number of lines per file; the default is set to 1,000.

To use a different prefix than “x”, add a second argument:

[student@localhost dictionary]$ ls
linux.words
[student@localhost dictionary]$ split -l 100000 linux.words words
[student@localhost dictionary]$ ls
linux.words wordsaa wordsab wordsac wordsad wordsae

tail

The tail command displays the bottom part of text data. By default, the last ten lines are displayed. Use the -n option to display a different number of lines:

[student@localhost ~]$ cal 1999 | tail -n 9
     October                 November                 December
Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa
                1  2       1  2  3  4  5  6             1  2  3  4
 3  4  5  6  7  8  9    7  8  9 10 11 12 13    5  6  7  8  9 10 11
10 11 12 13 14 15 16   14 15 16 17 18 19 20   12 13 14 15 16 17 18
17 18 19 20 21 22 23   21 22 23 24 25 26 27   19 20 21 22 23 24 25
24 25 26 27 28 29 30   28 29 30               26 27 28 29 30 31
‘

Important options include the following:

Option Description
-f Display the bottom part of a file and follow changes means to continue to display any changes made to the file.
-n +x Display from line number x to the end of the file.

tr

The tr command is useful for translating characters from one set to another. The syntax of the command is tr SET1 [SET2].

For example, the following will capitalize the output of the date command:

[student@localhost dictionary]$ date
Sat Dec 3 20:15:05 PST 2016
[student@localhost dictionary]$ date | tr 'a-z' 'A-Z'
SAT DEC 3 20:15:18 PST 2016

Note that in order to use the tr command on a file, you must redirect the file into the tr command, like so, because the tr command does not accept files as arguments:

tr 'a-z' 'A-Z' < file

Important options include the following:

Option Description
-d Used when the second set is omitted; it deletes the matching characters. For example, the following deletes all numbers from the output of the date command: date | tr -d '0-9'.
-s Repeated matching characters are converted into a single character before being translated. Thus, “aaabc” would be converted into “abc” and then translated to “Abc” if the command tr -s 'a' 'A' were executed.

unexpand

The unexpand command converts spaces into tabs. Use the -t option to specify how many consecutive spaces to convert into tabs.

uniq

The uniq command will remove duplicated lines from a sorted file:

[student@localhost ~]$ cat names.txt
adm
bin
bin
operator
root
root
root
shutdown
[student@localhost ~]$ uniq names.txt
adm
bin
operator
root
shutdown

For the uniq command to work correctly, the file must first be sorted. Because the sort command has a “unique only” option (see the “sort” section in this chapter for details), it is more common to use the sort command, rather than the uniq command, to remove duplicates.

However, the uniq command has an interesting feature in that it will report how many duplicate lines were present; this feature is not available with the sort command:

[student@localhost ~]$ uniq -c names.txt
    1 adm
    2 bin
    1 operator
    3 root
    1 shutdown

wc

Used to display the number of lines, words, or characters of data. By default, all three values are displayed:

[student@localhost ~]$ wc sample.txt
2 4 24 sample.txt

Important options include the following:

Option Description
-c Only display the number of bytes. (For text data, a byte is one character.)
-m Only display the number of characters.
-l Only display the number of lines.
-w Only display the number of words.

800 East 96th Street, Indianapolis, Indiana 46240