John Hoffmann's Weblog

All | AI | Comedy | Cool Threads | General | Java | Open Source | Robotics | Solaris 10 | Wiki
« Ergo breaks in Austi... | Main | Running in Austin »

20050720 Wednesday July 20, 2005

Outsmarting myself Earlier today I was adding 2 additional languages to a localization utility I had written several months ago:
# !/bin/csh
#
## Invoke this script from the english branch in the directory
## of the file(s) that need to be copied to all the locales
##
## use -n switch to echo commands instead of executing them
##
## Example: where pwd is
## /web/www.java.com/en/renderers
##
## /web/tools/cp_l10n.csh -n javacom-rate-this-page.jsp javacom-rb.jsp
##
## output for dry run should be:
## /bin/cp javacom-rate-this-page.jsp /web/www.java.com/de/renderers/javacom-rate-this-page.jsp
## /bin/cp javacom-rb.jsp /web/www.java.com/de/renderers/javacom-rb.jsp
## /bin/cp javacom-rate-this-page.jsp /web/www.java.com/es/renderers/javacom-rate-this-page.jsp
## /bin/cp javacom-rb.jsp /web/www.java.com/es/renderers/javacom-rb.jsp
##
set dry_run=
if ( $#argv > 0 ) then
  if ($argv[1] == "-n") then
    set dry_run=echo
    shift
  endif
  foreach lang (de es fr it ja ko nl pt_BR sv zh_CN zh_TW)
    foreach file (${argv[*]})
      set destination_dir = `pwd | sed 's-/en/-/${lang}/-'`
      eval ${dry_run} /bin/cp ${file} ${destination_dir}/${file}
    end
  end
else
  cat $0 | grep "^##"
endif
While I was in there, the sed command attracted my attention - probably because its the only fun part of part of the script - well the eval of the dry run variable and the grep through $0 for lazy usage message are kinda neat too, but back to the story. I thought, "Gee John, you were really playing fast and loose with your regex. Replacing any occurance of 'en' is crazy. We need '/en/' to be safe to match only the english directory name. So I hastily changed the sed line to:
sed 's-/\/en\//-/\/${lang}\//-g'
Thinking, "Great, now it will only match the exact string '/en/'". Note: '\/en\/' being the escaped form of the pattern - since as everyone knows sed uses '/' as its delimiter...

You all laughing yet?!

Yeah, you seasoned regex guys, and I'm one of you - just having one of those days - you're seeing that when I had originally authored the sed command I chose '-' as the delimiter since '/' was going to be heavily used in the pattern. So now sed was looking for literally this mess: '/\/en\//'. Naturally it found no such patterns in the list of files and the script accomplished nothing.

I've had some half-baked idea that future coding in IDEs might free us from regular expression escaping problems and all syntax for that matter. I envision some visual clue that sets off a regular expression from the surrounding code such that no escaping is needed since the expression is expressed in non-ascii characters. I'll get back to that idea some day, or help me out here - anyone else thought about this?

Not sure if there is any lesson to be learned. The good thing is I used the dry run switch when I invoked the script and therefore had a chance to see that the pattern did not work and the script, if it were not in dry run, would have simply copied the same english file onto itself 11 times. I'm frequently surprised by how often the code I have authored looks foreign to me. Could be related to the fact that I continuously switch between perl, csh, sh, ksh, java, jsp, and jstl. Hard to believe, 50 years into the history of software programming, a single programmer is regularly using 7 or more different syntaxes for "if then else if" branching.

perlif ( ) { } elsif ( ) { }
cshif ( ) then else if ( ) then
shif [ ] then elif [ ] then fi
kshif [ ]; then elif [ ]; then fi
javaif ( ) { } else if ( ) { }
jsp<% if ( ) {%> <%} else if ( ) {%> <%}%>
and my personal favorite, not!
jstl<c:choose><c:when test='${}'></c:when><c:otherwise></c:otherwise></c:choose>

I've annotated another copy of the script, in the event anyone can learn from it:

# !/bin/csh
#
## Invoke this script from the english branch in the directory
## of the file(s) that need to be copied to all the locales
##
## use -n switch to echo commands instead of executing them
##
## Example: where pwd is
## /web/www.java.com/en/renderers
##
## /web/tools/cp_l10n.csh -n javacom-rate-this-page.jsp javacom-rb.jsp
##
## output for dry run should be:
## /bin/cp javacom-rate-this-page.jsp /web/www.java.com/de/renderers/javacom-rate-this-page.jsp
## /bin/cp javacom-rb.jsp /web/www.java.com/de/renderers/javacom-rb.jsp
## /bin/cp javacom-rate-this-page.jsp /web/www.java.com/es/renderers/javacom-rate-this-page.jsp
## /bin/cp javacom-rb.jsp /web/www.java.com/es/renderers/javacom-rb.jsp
##
# be default the dry_run variable is set to nothing
set dry_run=
# check to see if we have more than 0 arguments
if ( $#argv > 0 ) then
  # check to see if the dry run flag is the first argument
  if ($argv[1] == "-n") then
    # if it is set its value to the command "echo"
    set dry_run=echo
    # remove -n from the argument list
    shift
  endif
  # Loop through the set of languages
  foreach lang (de es fr it ja ko nl pt_BR sv zh_CN zh_TW)
    # Inner loop through the remaining arguments which should be file
    # names in the current directory
    foreach file (${argv[*]})
      # create a variable that substitutes the enlgish directory name
      # for the directory name of the language in the outer loop
      # use the back tick to cause the pwd (present working directory)
      # comand to be run and
      # pipe the pwd output into sed which does the language
      # search and replace, the resultant value is stored in
      # the variable destination_dir
      set destination_dir = `pwd | sed 's-/en/-/${lang}/-'`
      # use the eval command to execute the value of the dry_run varaible
      # if dry_run was empty, then /bin/cp gets executed
      # otherwise the echo command gets executed and /bin/cp
      # is simply printed to standard out as text
      eval ${dry_run} /bin/cp ${file} ${destination_dir}/${file}
    end
  end
# if we had less than 1 argument, cat the file and use grep to 
# show only the lines marked as usage instructions designated by ##
# $0 a predefined shell variable set to the path of the script 
else
  cat $0 | grep "^##"
endif

Note, my professor of "Unix Shell Programming" would be very dissappointed that I have repeatedly referred to this utility as a script. He encouraged all his students to call them programs so that Unix administrators who wrote in shell would command the same salaries as programmers. They are slightly different skills, but I'm not sure I place a higher value on one over the other. Shell programming provides more instant gratification in that it usually provides very quick returns on time invested. It also often has higher risk in that you can easily create run away programs that do very bad things if they fail to check for arguments or validate accuracy of constructed paths. Something to watch out for is hooking up a script as a root cronjob. Make sure you test that script in a pure root environment before setting it loose. The ENV for root is often different than what you experience su'd to root. Use the su - , to make sure you're not bringing along any ENV baggage that cronjob root won't have. Whenever possible I also like to adjust the time the cronjob is set to run so that I can watch the results while I am at work, since it is often the case that you set your crons to run at night and its never fun to be greeted first thing the next morning, or the 1st of the month with an unwelcome surprise. (2005-07-20 12:19:18.0) Permalink Comments [3]

Trackback URL: http://blogs.sun.com/hoffie/entry/outsmarting_myself
Comments:

Bootnote: syntax escaping problems in getting this shell script posted were a royal pain. I suspect velocity macro parser was the problem: I determined that double hashes: ##$# had to be escaped using the HTML entity code for one hash. Also, this construct freaked it out: (${argv[*]}), so I just used the entity code for the $ and that worked. My very worst experience with escaping was using Vignette story server back when it used tcl . We sometimes had to use up to eight \ to preserve an escape through the multiple rounds of parsing that were taking place!

Posted by hoffie on July 20, 2005 at 12:31 PM PDT #

Hey there, here's a nit and another piece of advice (feel free to ignore both ;-) The nit first, seriously, don't hard-code locale names in shell scripts, bad things lie that way : when we start translating to another language, someone with smarts is going to need to edit your script : better have it read locale names from a text file or something like that (oh, and document which file it reads, and what the format is) Next advice : run away fast from a tendency to write shell scripts in csh - csh is fine for a user-interactive shell, but it'll eventually bite when you try to use it for scripting (been there, been bitten) - more info at http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/ hope this helps ?

Posted by Tim Foster on July 20, 2005 at 01:05 PM PDT #

foreach lang (`cat /web/tools/locales.txt`)

I agree.

/bin/ksh

I agree. Typically I find myself converting csh to ksh if it exceeds a few lines. Unfortunately, I spend the day logging into dozens of remote hosts and setting my default shell to ksh prevents me from logging in :( So, I've not yet trained myself to stick with ksh.

Posted by hoffie on July 20, 2005 at 01:24 PM PDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed