Using UNIX for String Manipulation
Three UNIX commands
Several UNIX commands are particularly useful when using a shell script for complex string manipulations.
This page mentions only three of them. To analyze a string stored in a shell variable, you generally must use
embedded command execution, which is covered in course Unix Shell Programming.
- The
wc
command displays the number of characters in a stream of input (the variable value). This can be used to determine the length of a string.
- The
cut
command can extract either a whitespace-separated field or a set of characters from a string. This can be used to extract a sub-string or examine the fields of the wc
command.
- The
sed
command can perform complex manipulations on a line of text, searching for patterns and deleting or adding text based on various rules.
Sed Stream Editor
sed - manual page for sed version 4.0.3
SYNOPSIS
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors.
wc - word count
wc gives a "word count" on a file or I/O stream:
bash $ wc /usr/share/doc/sed-4.1.2/README
13 70 447 README
[13 lines 70 words 447 characters]
wc -w |
gives only the word count |
wc -l |
gives only the line count |
wc -c |
gives only the byte count. |
wc -m |
gives only the character count. |
wc -L |
gives only the length of the longest line. |
Using wc to count how many .txt files are in current working directory:
$ ls *.txt | wc -l
# Will work as long as none of the "*.txt" files
#+ have a linefeed embedded in their name.
# Alternative ways of doing this are:
# find . -maxdepth 1 -name \*.txt -print0 | grep -cz .
# (shopt -s nullglob; set -- *.txt; echo $#)
Using wc to total up the size of all the files whose names begin with letters in the range d - h
bash$ wc [d-h]* | grep total | awk '{print $3}'
71832
Using wc to count the instances of the word "Linux" in the main source file of a book.
bash$ grep Linux book.sgml | wc -l