This document describes the unix stream editor, implemented as sed(1) and provides examples.
1. Sed and perl regex
Sed uses perl regular expressions, which include the perl character classes.
A. Character classes
Character classes are used in perlre and are similar to quoted
characters like "\s" for matching whitespace. Character classes
are enclosed in "[::]" (like [:space:]) and can be used just like
any normal character. Character classes can themselves appear in
a character class list. For example, consider the difference in
the following 2 re's:
'[,[:space:]]' - match whitespace and comma, in any order ',[:space:]' - match a comma followed by whitespace
B. Man pages
For more info on the regular expressions used by sed (perlre), see
the following man pages:
perlunicode - For details about unicode and for details on "\pP",
"\PP", and "\X" (e.g., "\x{85}", "\x{2028}",
"\x{2029}"
perluniintro - Unicode in general.
perllocale - Localization, which affects, for example, the
list of alphabetic characters generated by "\w".
2. Eric Pement's "One-Liners For sed"
The following sed document (Pement 2004) contains some pretty useful sed one-lines. See (local) content in #sed1line.txt or the web url at http://www.student.northpark.edu/pemente/sed/sed1line.txt
3. Multiple expresions
Sed can parse multiple regular expressions and apply them to it's input stream. This is useful, for example, when removing text from the beinging and end of lines in the input stream. Consider an input file foo.txt, with the following content:
SOL This is line 1 EOL SOL This is line 2 EOL
One way to remove the SOL and EOL symbols is to pass the contents of the file through an invocation of sed with 2 expressions, one for removing hte SOL symbol and the other for removing the EOL symbol. The following sed invocation does exactly that:
bash $ cat foo.txt | sed 's@^SOL\W@@;s@\WEOL$@@'