Wednesday April 08, 2009
Here's what kernel developers talk about at coffee: bugs in our command shells. At least, that was the topic the other day when Stephen, Dave and I were complaining about the various troubles we'd had with different command shells.
While others moved to zsh some years ago, I have been a bash user since kicking the tcsh habit. But for years I have been plagued by a subtle nuisance in bash: sometimes it doesn't catch terminal window resizes properly. The result is that command line editing works very poorly until bash finally figures this out. After a while, I worked out that this behavior happens only when the window size change happens while you're in some application spawned by bash. So if you're in an editor like vim, change the window size, and then exit (or suspend) the editor, bash will be confused about the terminal size.

While this has always annoyed me, it never quite reached the threshold for me to do anything about it. But recently it has been bugging me more and more. After we returned from coffee, I dug into the bash manual and discovered a little-known option, the checkwinsize builtin. In a nutshell, you can set this shell option as follows:
shopt -s checkwinsize
which the bash manual says: If set, Bash checks the window size after each command and, if necessary, updates the values of LINES and COLUMNS. Sounds great! As an aside, I think that as a modern shell, bash should set this option by default. (Others think so too).
With much self-satisfaction I set this option and got ready for line editing bliss. But, no joy. I checked and rechecked, and finally started using truss, and then DTrace, to try to understand the problem. After some digging I eventually discovered the following bug in the shell. Here's the meat of the writeup I submitted to the bash-bug list:
On Solaris/OpenSolaris platforms, I have discovered what I believe is a
bug in lib/sh/winsize.c.
I discovered with a debugger that the get_new_window_size() function
has no effect on Solaris. In fact, here is what this file looks like if
you compile it:
$ dis winsize.o
disassembly for winsize.o
section .text
get_new_window_size()
get_new_window_size: c3 ret
That's it-- an empty function. The problem is that the appropriate header
file is not getting pulled in, in order to #define TIOCGWINSZ.
As a result, even with 'shopt -s checkwinsize' set on Solaris, bash
does not check the win size on suspend of a program, or on program
exit. This is massively frustrating, and I know of several Solaris
users who have switched to zsh as a result of this bug.
I have not tried bash 4.0, but looking at the source code, it appears
that the bug is present there as well.
Fix:
I added an ifdef clause which looks to see if the HAVE_TERMIOS_H define
is set, after the #include of config.h. If it is, then I #include the
termios.h header file. This solves the problem, which I confirmed by
rebuilding and dis'ing the function. I also ran my recompiled bash
and confirmed that it now worked correctly.
Hopefully the bash maintainers will take note and fix this bug. In the mean time, I'm going to see if we can get the fix for this applied to the Nevada (and hence, OpenSolaris) version of bash.

Update: The bash maintainers have fixed this bug in the following patch to bash 4.x. Hurray!
(2009-04-08 14:48:17.0) Permalink Comments [8]
Trackback: http://blogs.sun.com/dp/entry/why_bash_doesn_t_work


what's so great about bash with ksh93 available?
i've always found it to be a crappy scripting language due to various bugs that hit me regarding specially to signal handling and unexpected array management issues and have written scripts in ksh ever since.
bash had always been superior when used interactively till ksh93 arrived, now ksh93 is all i use for both scripts and interactive usage
Posted by nacho on April 08, 2009 at 04:03 PM PDT #
@nacho: I also use ksh for scripting. However I've got a whole mess of bash dotfiles, and spent a bunch of time getting it set up the way I like for interactive use. While I could switch, I'd have to do quite a bit of work to do so.
If I was to switch interactive shells, it might well be to zsh, for its programmable completion, which these days is (IIRC) aware of SMF, ZFS and Zones command sets right out of the box.
Posted by Daniel Price on April 08, 2009 at 04:08 PM PDT #
Thanks for spending the time to trace the cause
of this bash problem and fixing it.
Posted by Nigel Smith on April 08, 2009 at 04:19 PM PDT #
Hi,
Hehe, its a very good point. Why do we still have so many bugs in one of the most used tools on a Unix system, the shell!
This is why we are writing our own in Ada which will be fully POSIX compliant per the OpenGroup spec as well as properly designed so that it can be maintainable ! Project is called deltash should anyone be interested to help #auroraux is our IRC.
Thanks for bring this point up, I think its a very interesting one.
Edward.
Posted by EdwardO'Callaghan on April 08, 2009 at 06:20 PM PDT #
Thanks! The bug frustrates me.
Posted by Mike Aldred on April 08, 2009 at 07:02 PM PDT #
"Kicked the tcsh habit"? (:-(
And for bash, no less!
One of the best interactive shells ever, even though admittedly it can't be used for serious scripting. But for day-in-day-out interactive work, tcsh is very powerful; for scripting, Bourne shell "glue code" and AWK engines work great.
Posted by UX-admin on April 08, 2009 at 11:38 PM PDT #
Thanks for finding the time to investigate the issue. I was tempted to move to zsh too, now that I have finally done it, I'm loving it.
Posted by Guest on April 09, 2009 at 01:40 AM PDT #
Thanks for looking into this. There are more annoying bugs in bash (ex. #3743), but this one is outstanding.
Posted by Jan Hnatek on April 14, 2009 at 11:15 AM PDT #