Arieh's Weblog

     
 
Plaintext Diff Change Bars Utility

Yesterday I had to publish an updated plaintext document for review by a technical group that I am part of.

I wanted to provide the latest revision of the document in a manner that the difference (changebars) from the previously published version would be marked, thus facilitating the ability to email the updated documents, with the changed marks. (No, I did not want to publish side-by-side comparisons).

I was surprised when my Google search for "plain text diff change bars" did not come up with a solution to the problem. (Perhaps I erred in what I searched for.)

I decided to come up with a solution for that problem (which I will probably have to turn to in the future). Below you will find the perl program I hacked and provided satisfactory results.

The crux of program is to:

  • generate a diff output (sccs diffs)
  • traverse the output of diff, operating on the later revisioned file
  • generate changebar-ed output

The code follows below. Apologies to perl-haters and perl-purists for the crude hack.

#!/usr/bin/perl -w
#
#  %W% %E%
#
#  chbars:	 generate change bars based on 'sccs diffs'
#
#		 The program will output the document whose name is passed
#		 with change bars in all locations different from the
#		 previous version (or the version specified).
#
#  	chbars [-r{rev}] path
#
use FileHandle;
use Getopt::Long;

use vars qw/ $opt_r $opt_verbose $opt_help $SCCSDIFF_PROG /;

#
#  Usage: display Usage
#
sub usage {

    print <<"E-O-F";
$progname: INFO: Usage: $basename [options] file
    where options are:
	  -r{rev}	the SCCS revision with which we compare
    and
	file 		the file to compare with

E-O-F
    exit;
}

#
#  do verbose printing
#
sub do_verbose {
    return unless $opt_verbose;

    print STDOUT $basename, ": ", "@_\n";
}

#
#  do error printing
#
sub do_error {
    print STDERR $basename, ": ERROR: ", "@_\n";
}

#
#  doSCCSDiff - generate the differences between the two versions of the file 
#
#  Arguments:	file		pathname of file
#
#  Returns:	outputs the file with its change bars
#
sub doSCCSDiff {
    my ( $file ) = @_ if @_;

    &do_verbose ("doSCCSDiff ( path=$file )");

    #  if the argument is not a file, issue an error and return
    #
    if (! -f $file) {
	&do_error ("$file is not a file");
	return;
    }

    &do_verbose ( "$SCCSDIFF_PROG $file");

    my $PATH = new FileHandle "$file", "r";
    die "$basename: open error on file \"$file\": $!" 
					    unless defined $PATH;

    my $DIF= new FileHandle "$SCCSDIFF_PROG $file |";
	die "$basename: open error on $SCCSDIFF_PROG invocation: $!" 
					    unless defined $DIF;

    #  the type of lines 'sccs diffs' outputs is like:
    #
    #  8,9c8,10
    #  221c222
    #
    doDiff( $PATH, $DIF );
}

sub doDiff {

    my ( $file, $dif ) = @_ if @_;

    my $line = 1;

    while( <$dif> )
    {
	if ( /^(\d+|\d+,\d+)(a|c)/o )
	{
	    my @args = split ",", $'; 

	    my $fdifline = $args[0];
	    my $ldifline = $fdifline;
	    $ldifline = $args[1] if ( $args[1] );

	    @args = split ",", $1;

	    my $ofdifline = $args[0];
	    my $oldifline = $ofdifline;
	    $oldifline = $args[1] if ( $args[1] );

	    my $buf;
	    my $dline;

	    # position the $file cursor on the line indicated by fdifline
	    #
	    for ( my $i = $line; $i < $fdifline; $i++ ) {
		$buf = <$file>;
		print "  $buf";
	    }

	    #  output the lines in $file between fdifline to ldifline
	    #
	    for ( my $i = $fdifline; $i <= $ldifline; $i++ ) {
		$buf = <$file>;
		print "| $buf";
	    }

	    $line = $ldifline+1;
	    
	    my $diflines = $ldifline-$fdifline+$oldifline-$ofdifline+1;
	    for ( my $i = 0; $i < $diflines; $i++ ) {
		$dline = <$dif>;
	    }
	}
    }
#
#  MAIN program - variable initialization
#
    $opt_help	= '';
    $opt_verbose= '';
    $opt_r      = '';
    my $result	= '';
    $result = GetOptions( 'r=s', 'verbose', 'help' );

    $progname   = $0;
    $basename	= substr ($0, rindex ($0, '/') + 1);

    $SCCSDIFF_PROG = 'sccs diffs ';
    $SCCSDIFF_PROG .= " -r$opt_r" if ( $opt_r );

#
#  MAIN program - body
#
MAIN: {

    @files = @ARGV;

    #
    #  show the usage if -help passed
    #
    &usage if ($opt_help);

    #
    #  issue usage message if no argument was passed
    #
    &usage unless (@files);

    #
    #  invoke the program on all the arguments passed
    #
    for $arg (@files) {

	&doSCCSDiff( $arg );
    }
}

I am sure I can improve the regular expression on matching the diff-description lines, and the subsequent parsing of the old-file lines and the new-file lines. Possible improvements also include operation on diff/gdiff file comparisons.

Enjoy, provide comments, reuse, modify, whatever ...

Posted by arieh @ 02:00 PM MST [ Comments [2] ]
 
 
 
 
Comments:

My approach to this problem (for a group project a few months ago) is a bit different, and doesn't do the SCCS extraction. It's based around <code>cut</code>(1) and the more obscure <code>sdiff</code>(1), like so:
$ sdiff -w 160 new old | cut -c 1-80
This will generate right hand side marks. If you replace '<' and '>' by '|' with <code>sed</code>(1), you end up with similar output to your Perl script. (You could flip new and old and cut 80-160 if you want your changebars on the left hand side.)

Posted by Stephen Hahn on July 30, 2004 at 02:18 PM MST #

It was obvious to me that some other solution was out there. Indeed I later found a rfcdiff utility that works similar to sdiff and does generate changebars (although some hacking would be required to modify it). Stephen's approach seems a very simple one, and one in the spirit of Unix at that.

Posted by Arieh on July 30, 2004 at 03:04 PM MST #

Post a Comment:

Comments are closed for this entry.
 
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

[RSS Newsfeed]

Valid XHTML or CSS?

[This is a Roller site]
Theme by Rowell Sotto.
 
© Arieh's Weblog