Bill Sommerfeld's Weblog

Still Under Construction. Watch for falling objects


20071101 Thursday November 01, 2007

Looking good, save for the name.

Ran into a few bugs installing the Indiana prototype.  

1) the installer got confused when I attempted to add the user "sommerfeld".  (a 8-character username limit is a figment of useradd's imagination).    I had to reboot and try again. 

 2) the lack of the nvidia binary driver in the distribution meant that it didn't cope with a 1920x1200 display.

but otherwise it installed with a zfs root in almost no time flat from CD (system refused to boot from a USB key).

It still needs a name change, though..

(2007-11-01 17:21:29.0) Permalink

Premature naming.

So, a preview of the new packaging & install technology produced by Project Indiana was just released. I'm shortly going to be installing it on a spare system in my office just to give it a shot.

Unfortunately, it's being called the "OpenSolaris Developer Preview" and is being portrayed as a distinctly special binary distribution on the opensolaris home page. The name is unfortunate for a number of reasons:

  1. The vast majority of the changes have not yet received the typical design and architecture review received by Solaris components
  2. There is not yet community consensus that OpenSolaris should have a reference binary distribution
  3. There is certainly not yet consensus that the Indiana technology is the right tool for the job.

I hope the folks who chose this name despite ample warning that it would cause trouble quickly reconsider. And I hope that the poor choice of name doesn't deter people from giving it a try. But the choice of names is forcing something of a constitutional crisis within opensolaris.

(2007-11-01 10:29:50.0) Permalink Comments [1]

20070207 Wednesday February 07, 2007

When a favorite restaurant closes

Valerie asks what she can do about a favorite restaurant which has lost its lease and will most likely need to move.

Don't Panic.

A while ago (must be over a decade ago by now), the canonical Chinese restaurant at the MIT end of Cambridge, Mary Chung's, lost its lease and was shut down for about a year before they found new space on the other side of Massachusetts Avenue. Mary's was open every day but Tuesday, though she took an annual one-week summer vacation (which was known as "the week of Tuesdays" to some of her regular patrons).

The Year of Tuesdays was painful for some but they came back from it stronger than ever in a better, larger space. Recently they were even one of the five Boston-area restaurants featured in an episode of The Hungry Detective on the Food Network.

There's not a heck of a lot you can do unless you've got connections in the commercial real estate arena, but there are a few things which come to mind:

  • Keep patronizing them until the bitter end.
  • Stay in touch with the proprieter during the shutdown period (easier with FdM than it was with Mary's since they have the secondary location).
  • Most likely there will be some amount of town-level zoning/licensing involved in the move. Generally the only people who comment on such matters are concerned abutters; statements in support of the applicants from satisfied customers will typically make a big impression on the licensing authority.
(2007-02-07 11:56:03.0) Permalink

20070206 Tuesday February 06, 2007

Signs the DRM house of cards is collapsing. I'm happy to see Steve Jobs' open letter to the music industry where he calls for the end of DRM on downloadable music. I'm happy to say that I have on the order of 5200 tracks on my ipod, none of which were purchased from iTunes. I have a legitimate fair-use right to all of them. The vast majority were ripped from CD's I own and which I still possess. Some of the rest are podcasts (offered freely to all); some were mp3's of performances I participated in. None were downloaded from file sharing services. Steve's open letter refers to "secrets" being the key to security. General principles of cryptography say that in secure systems, the only secrets should be changeable and limited in scope. The nature of DRM is such that you'll typically end up with the same set of secrets in every device/player which needs access to the plaintext content, which is what led to the collapse of the DVD CSS scheme and its followons for HD DVD's. Time after time people learn the hard way that you can't effectively hide secrets in binary object code -- given enough time and digging it will be possible to dig any keys and algorithms out of the blob of code. (2007-02-06 18:50:18.0) Permalink

20061120 Monday November 20, 2006

if you thought lost bombs were bad, consider lost mustard gas.. In an analogy to the "Windows Genuine Advantage" program, Simon Phipps mentioned the recent discovery of explosives underneath a British airfield, and draws an analogy to anti-piracy "kill switches" embedded in software. While not directly analagous to a "kill switch", a couple years ago I heard of a somewhat more astonishing case of leftover lurking horrors: in 1993, World War I-era mustard gas shells were discovered in what is now an affluent residential neighborhood of Washington, DC in 1993. As of this summer, the cleanup was still in progress.

Returning to the real target : I share Tim Bray's concerns. License enforcement by intentional denial of service has no business going into mission-critical software; we have a hard enough time coping with denial of service from unintentionally introduced "features". (2006-11-20 20:16:52.0) Permalink

20051116 Wednesday November 16, 2005

The End-to-end argument meets ZFS

I'm really a networking&security type at heart.  Why am I excited about ZFS?

Back when I was studying for a degree in computer science, I took what was then (and probably still is) the best undergraduate course in MIT's CS department: Computer Systems Engineering, better known as "6.033" or just "'033".

A major part of the course was a series of case studies -- we would read an important paper on a system, write a short analysis, and then discuss the system in class.

One of the key papers presented was Saltzer, Reed, and Clark's "End to End Arguments in System Design"

I'll quote the abstract:

This paper presents a design principle that helps guide placement of
functions among the modules of a distributed computer system.  This
principle, called the end-to-end argument, suggests that functions
placed at low levels of a system may be redundant or of little value
when compared with the cost of providing them at that low level.
Examples discussed in the paper include bit error recovery, security
using encryption, duplicate message suppression, recovery from system
crashes, and delivery acknowledgement.  Low level mechanisms to
support these functions are justified only as performance
enhancements.

The paper has spawned a lot of debate and more than a few followups over the years, and interminable arguments about what counts as an end, but overall I think it's held up pretty well.

Fast forward to a couple years ago when I first saw a high level overview of the ZFS design.  I immediately thought of this paper.

ZFS applies the end-to-end principle to filesystem design.  

End-to-end is normally applied to distributed systems, where two distinct "ends" are communicating with each other, often in real time or with relatively short delays.

Here, the "ends" are separated mainly by time: one "end" writes data to the filesystem, and the other "end" expects to get the exact same data back in the future.  (And the "middle" is the storage subsystem, which these days is itself a complex distributed system).

By placing the functionality required for robustness at a relatively high layer within the storage stack, ZFS can perform these functions with reduced overall system cost; you can use a much simpler disk subsystem to get a desired level of performance, availability and reliability.

For instance, the filesystem knows for sure which disk blocks are in use.  The disk doesn't.  If you replace a disk in a mirror or Raid-Z group, ZFS only needs to copy the blocks that are currently in use to the new disk; when lower layers are responsible for redundancy, you have to copy the whole
disk.  With the upper layer responsible for redundancy, the repair takes less time, and your window of exposure to an additional failure can be significantly shorter.

I'm hoping this leads to simpler (and cheaper) storage hardware in the long run -- JBODs seem to be ideal for ZFS, and you can take the battery-backed NVRAM out of the raid controllers and give it to the lumberjacks.

Technorati Tag:

(2005-11-16 09:20:06.0) Permalink

20051020 Thursday October 20, 2005

packaging svk So, Adam, never fear..

I have two bits of tech in hand which will make deploying svk on solaris for development purposes pretty painless.

1) NetBSD's pkgsrc will build packages on solaris and handle chasing down the dozens of dependencies.  Currently it has SVK 1.00, but I've got diffs to the pkgsrc config to take it to 1.05 under review right now (three packages needed to be upgraded and two more needed to be added.  took me about an hour and a half last night).

2) there's a "gensolpkg" inside pkgsrc which will create solaris/SVR4-format packages.   it's a little rusty as it still assumes the 9-character package name limit, but that's easily repaired, and I should probably commit that fix as well..

toss them all into a single packagestream-format blob and we're all set.

Only real misfeature at this point is that pkgsrc insists on building its own copy of perl.   But given that we lock down most aspects of the build/development environment, and occasionally get hurt when we don't, this might be another case where we should just take the hit of another copy.

       
(2005-10-20 05:56:45.0) Permalink

20051018 Tuesday October 18, 2005

Creative Hash Functions Take a quick look at this macro definition.   Did you spot the bug?

Because of poor paren placement, the OUTBOUND_HASH_V6 macro in sadb.h
computes a hash value of:

        *x^(*x+1)^(*x+2)^(*x+3)
when:
        x[0]^x[1]^x[2]^x[3]
was intended, with the result that only a small number of outbound hash buckets are ever used.  Half end up in bucket 0.  All hash values have two low order bits of zero, then (going upwards) zero or more 1 bits, and then all zeros until the top of the word.  Distribution looks like:

value:         occurances
       0         2147483648
       4         1073741824
       c          536870912
      1c          268435456
...
1ffffffc                 16
3ffffffc                  8
7ffffffc                  4
fffffffc                  4

needless to say, this distribution is awful, with only 31 unique hash values, and with 50% of entries in one bucket, and with 99% of hits in only 7 buckets. 

Discovered this shortly before 10:30 this morning; filed bug 6338289; tested fix on x86 and sparc, code reviewed, and integrated into the development sources by 5:50pm this afternoon. 

UPDATE: In response to a comment: Yes, inline functions would be better here, but the compiler version we used during solaris 9 development didn't support them in C.    If we're going to revisit this code, a more likely mini-project here is to find all the various places within IP where we compute hash functions based on a protocol address, find the best one, and make that a common one used by all the address-based hash functions, possibly tossing in a key or equivalent as a defense against hash-bucket-clogging attacks.

(2005-10-18 17:39:44.0) Permalink Comments [1]

20050929 Thursday September 29, 2005

And something resembling a root cause analysis. The Prius saga continues. Toyota sent the NHTSA a complete reply on August 26th.

The meat is in Responses 8 and 12.  It appears that Toyota released a patch in October 2004 which fixed a firmware bug - apparently the stall occurred when the firmware thought the engine wasn't taking in enough air, but the "not enough air" threshold was set too high.  Some of the details are in attachments that were not made public, but it's now clear that they're confident they understand the cause of the stall:

"Under certain circumstances, the engine ECM incorrectly determines that the gas engine is experiencing a failure to start when the engine intake air volume is lower than the ECM's programming criteria.  In this condition, the gasoline engine will not start (because the ECM believes it cannot) and the vehicle will go into a fail-safe mode of electric-only operation.  In conjunction with the ECM misjudgement, the warning lights ... will be illuminated when this occurs."

and there are two relevant fixes.  The first one was released as part of "Special Service Campaign 40A" in October 2003:

"Due to a programming error, if the vehicle is restarted in the "fail-safe" mode, a secondary condition may occur where the vehicle transmission may not operate smoothly."

Subsequently, they released TSB EG047-04:

 "Toyota discovered a software error within the engine intake air volume criteria ... Toyota developed a revised software version and introduced this software along with reprogramming methodology in a TSB in the middle of October 2004"

What's perhaps a bit strange is that the first bug and a third unrelated (and seemingly trivial) defect were the subject of two different "special service campaigns" where they actively asjed customers to bring in their cars for a firmware upgrade, but the seemingly more critical bug (the apparent proximate cause of the stalls) is only subject to a TSB, which appears to be a "fix it if the customer complains" reactive patch.  If I buy a Prius I guess I'll feel obligated to check for TSB's on a regular basis...

(2005-09-29 15:12:04.0) Permalink Comments [2]

20050906 Tuesday September 06, 2005

How not to sell me a firmware-driven product... Well, start off your sales pitch by describing how easy it is to reboot the product, and by talking about how I can avoid trips to the repair shop by rebooting it.

I was pretty close to being willing to put down a deposit on a Prius to replace my Saturn, but now I'm off doing a "due diligence" of a sort.  What I've learned so far: there's a software defect which causes the gasoline engine to shut off which may have been fixed in a firmware upgrade.  The NHTSA's Office for Defect Investigation is on the case (investigation PE05029)  but hasn't yet released a final report.  Some of the documents filed by Toyota in response to the ODI's request for investigation have been made available, but there's not that much "meat" in the main document of July 22nd-- which promises follow-on updates on August 5th and/or 26th which don't seem to be available from ODI just yet.

One friend of mine who has a Prius has experienced this stall condition, and then had the firmware upgrade which may -- or may not -- fix it.  He hasn't had a stall since the firmware upgrade but, well, anecdotes are not data.

I'm not so much worried that there are bugs in the firmware.   Of *course* there will be bugs in any software system of nontrivial complexity.  But are they set up to diagnose and fix defects found in the field by customers?  Instructing customers to "just hit ctrl-alt-del and drive on" doesn't sound consistent with an attitude towards software quality which will get those defects fixed.  I hope this particular sales guy is an outlyer.

Given the limitations of repair shops, perhaps software-controlled cars like the Prius should be equipped to "phone home" with the moral equivalent of a crash dump whenever anything odd happens....

(2005-09-06 10:10:23.0) Permalink Comments [1]

20050825 Thursday August 25, 2005

How to destroy a brand: Saturn is dead. As far as I'm concerned, GM's Saturn line is dead.

Some years ago, my parent's (non-GM) car caught fire in their garage due to a defective cruise control switch.   The fire went out but there was substantial smoke damage elsewhere.  They had been on vacation at the time, and a recall notice for the defect was in their held mail when they returned from vacation.  So I tend to take recall notices and the like as high urgency issues, worthy of immediate action.

My current car is a Saturn.

Today, in my mail, I received a plain white envelope with a Saturn return address and the ominous notices "Important Vehicle Information Enclosed" and  "Open Immediately Do Not Discard".   I was suspicious, but given the past family experience with recalls, I opened it immediately just in case.

Was it a recall notice? 

Nope, just a slimy marketing trick.  When I called the dealer to complain, they denied that it was a deceptive practice and then hung up on me.

It used to be that Saturn tried to be a brand for people who just wanted reliable transportation without the slimy behavior so common among auto dealers.  My experience buying in 1996 was good.  But now it seems they're no different from the rest.  For all practical purposes, they're dead.



(2005-08-25 08:52:24.0) Permalink Comments [0]

20050811 Thursday August 11, 2005

Symphony and Release Numbering So, there I was last Sunday in rehearsal, minding my own business in the middle of the trombone section, and I look up and I see sheet music entitled "Symphony in E Minor (No. 5 Opus 95) / From the New World".   But wait, isn't the "New World" Dvořák's 9th Symphony?  Err, well, yes it is, at least in all the concert programs and liner notes I've ever seen....  the musicologists and the sheet music publishers seem to disagree..

This is more confusing than our release numbering scheme for SunOS/Solaris ...




(2005-08-11 13:27:23.0) Permalink Comments [1]

20050608 Wednesday June 08, 2005

On the conversion of working systems into warm bricks... Operating systems development communities wind up inventing and using a fair bit of slang.  The existing Solaris development community within Sun tends to use one particular metaphor a fair bit: the brick.  That's what you get when you take your test machine, add your latest test bits, and, well, something goes wrong in a big way and your system (whether a low end PC or high end multiprocessor) winds up having all of the capability of a Warm Brick, at least until you get  a chance to reinstall it. 

Typical usage: "Oops, I bricked it."   "Hey, when you brickify a test machine, at least reinstall a good build on it before you move on..", and "Bugs in the packaging scripts may still result in brickification".

(Note: members of another OS development community have been known to use "brick" as short for  "throw a brick at".   As far as I can tell, these usages are completely unrelated).
(2005-06-08 16:22:54.0) Permalink Comments [1]

20050512 Thursday May 12, 2005

Old News (encryption without integrity protection may not yield confidentiality) As one of Sun's IPsec developers, I've been getting queries regarding a recent advisory from a UK agency regarding common mistakes made when configuring IPsec-based VPN tunnels.  This advisory has gotten some press coverage, but isn't really news. 

I first heard about it from Steve Bellovin at the IETF meeting in Danvers, Massachusetts over 10 years ago; he subsequently published "Problem Areas for the IP Security Protocols" describing this flaw.

And, if you try to set this up using Solaris's IPsec, you get warned:

# ifconfig ip.tun0 plumb encr_algs aes
ifconfig: WARNING - tunnel with only ESP and potentially no authentication.


I hope other vendors will add similar warnings now..
(2005-05-12 14:46:24.0) Permalink Comments [1]

20050511 Wednesday May 11, 2005

Stealing my thunder.

A recent discussion on the main IETF mailing list surrounded the visibility of dependencies among not-yet-published documents in the RFC Editor's queue.  I did a quick hack job with awk and graphviz to plot the dependency graph, posted it, and got a response from Bill Fenner indicating that he'd been there, done that, and had clearly gotten the tool to sing and dance at his whim.

(2005-05-11 15:13:12.0) Permalink Comments [0]

Calendar

« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today

RSS Feeds

XML
All
/General
/IETF
/IPsec
/Music
/OpenSolaris
/Solaris

Search

Links


Navigation