picture of tech dogg Tech Dogg's Dox Tox

Sep
17

I've been using sed, the “stream” or non-interactive text editor, since 1985. I remember reading, re-reading, and poring over the only complete documentation about it I could find at the time, entitled “SED - A Non-Interactive Text Editor,” dated August 15, 1978, by Lee E. McMahon, of Bell Laboratories in Murray Hill, New Jersey. I still have this terse, 10-page document, in my decades-old, six inch thick Unix binder. The edges of the paper are worn, smooth, “boofy,” from constant thumbing and flipping.

I'm sure this document was originally composed in troff.

sed is a technical writer's best friend, in my opinion. I remember a developer with whom I worked saying to me about the same time I was discovering this utility “sed and nroff are programming for technical writers.” I agree with him.

Sometimes I think I spend too much time writing sed scripts and not enough time writing documentation. On second thought, my sed scripts are real timesavers. I've used sed to automate most of the mundane, time sucking tasks that reduce my effectiveness as a writer. Shouldn't I be spending more time writing, and less time searching, gleaning, organizing, verifying, rechecking, and printing or outputting documentation?

I've written sed scripts that convert SGML into wiki markup, ASCII text (essentially nroff terminal output) into SGML, Interleaf ASCII (which looks eerily like HTML or SGML tags) to nroff files, and vice versa. I've also written sed scripts that globally replaced the year of copyright in a set of files, updated chapter and section titles, renamed products (which happens much too frequently at Sun), and the touch date in man pages.

One of the very first sed scripts I wrote was a Bourne shell script that removed comments from a troff macro definition file and dumped the output into a file that the user specified. It was a very basic sed script:

sed -e '/^\.\\\"/d
s/\\\" *-.*/\\\"/g' $i >$MACRODIR/tmac.$i

I dont' recall exactly why I wrote this script, actually. Nor do I recall why I didn't also completely erase the nroff comment indicator (\\").

Oh well.

My biggest pet peeve about sed is the way in which it handles lines that you modify with the N, D, P, H, h, G, g (get), and x functions. This pet peeve might be due to my misunderstanding of how sed works, but I strongly suspect that it has more to do with sed's limitations. Tell me if I'm wrong, but do lines that one modifies with these functions essentially die? In other words, if I modify a line with one of these functions, can I no longer further modify that modified line? I think not. It's dead. Like the wicked witch, I killed it.

Consequently, I'm forced to close that call to sed and pipe the output to another call to sed to process the line I just modified. Gah! I hate 'dat! Is this problem due to the fact that, in these cases, sed swaps the contents of the pattern space with the hold space? Is that the culprit here? Does anyone know?

One solution that occurred to me very recently is that I might be able to use the -e option to obviate my having to call sed more than once to modify the same line more than once. But, I haven't had the opportunity to test this hypothesis.

In any event, in scripts where I'd prefer to make a single call to sed to reduce my processing overhead, I've been forced to make multiple calls to sed to get the output I want. Annoying! Annoying!

Another pet peeve about sed I have—and again this pet peeve might be due again to my misunderstanding about how sed works—is that sed doesn't find ranges on the same line. For example, when I specify the following search pattern:

/pattern1/,/pattern2/
sed matches the following lines in a file:

This line contains pattern1.
This line contains pattern2.

But not the following line:

This line contains pattern1. This line also contains pattern2.

Gah! I hate 'dat, too!

In this situation, I'm forced to use sed's more urbane and sophisticated cousin awk, which really sucks up system resources, but does the deed nicely nonetheless.

In the 22 years that I've been using sed I've come across some handy sed functions I'd like to share. One of my most favorite comes in very handy when converting SGML, HTML, or XML tags to some other format. One minor feature of sed to keep in mind when specifying a wildcard (or asterisk) in a substitute command is that sed matches all patterns on a line up to the very last one. In other words, if sed finds two or more of the same patterns for which it searches on a line, sed matches all of those instances up to the last one. So, say you have the following contents in a text file:

<sect1 id="SECTION-1"><title>Revision Record</title>
<para>The following table lists the <em>new</em> information in this chapter.</para>
And say that you want to remove all instances of SGML tags in a file, you specify the following substitute function:
s/<.*>//g
Unfortunately, instead of removing all instances of SGML tags in your file, sed instead overzealously removes everything else as well, tossing every line of content shown in the preceding example. That's because sed matches all instances of “>” up to the very last one on the line.

The solution is the following substitute function:

s/<[^>]*>//g
This handy little function removes only the contents between tags. So, when you run the preceding substitute function on the content shown in the example above, you get this:
Revision Record
The following table lists the new information in this chapter.

As Borat says, “Nice!”

Another handy function consists of the hold and get functions. They come in handy when you need to concatenate content onto a single line, as follows:

<para>This is some text on
multiple lines that I'd 
like to concatenate onto one line</para>
You can transform the preceding lines into a single line with the following set of functions:
/<para/,/</para/ {
   /<para/h
   /<para/!H
   /</para/!d
   /</para/g
   s/>\n/>/g
   s/\n</</g
   s/ *\n/ /g
}
The output looks something like this:
<para>This is some text on multiple lines that I'd like to concatenate onto one line</para>
I'll provide more handy functions when I have the opportunity. Right now, I have to get back to working on Sun Cluster 3.2 Update 1 documentation.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

Pick a Blog Date

Search

Play Tag

11g 2/08 3.2 adjectives adverbs antecedents barnes bill brands buyout cluster cobol concepts cousins descriptivists dialect disksuite docs.sun.com documentation domain download edition eqn eric express gates geographic global google grammar grooming guest heroes interview java job jonathan kissing larry layoffs ldoms linguistic lx manager massachusetts membership microsystems mood name native nbc new node non-global nroff open oracle page pagerank pascal patch performance plural prescriptivists product pronouns pronunciation quarterly quinto rac regional release reliability results runoff schmidt schwartz search sed sol solaris solaris8 solaris9 source speaker subjunctive sun syler technical technology television trends troff virginia wiki writer writing york zachary zone

Feed Me

See Also

Navigate