Derrick's Security Weblog

pageicon Friday Feb 16, 2007

Responding to telnet

It's probably pretty obvious that I don't write many blog entries, but our recent activities around a telnet vulnerability have left me feeling a bit inspired.

What happened -

Last Sunday, an OpenSolaris community member (thanks "skunsul"!) pointed us at a link to a website that demonstrated a pretty severe security vulnerability in our telnet daemon. A couple of engineers who participate in that community saw the post, and one of them, realizing the severity of the problem, coded a fix, had it reviewed, and put back into the OpenSolaris source.

Then, taking advantage of time zones, an engineer in Australia picked up the ball and made the necessary code changes to Solaris 10. He also built an IDR (Interim Diagnostic and Relief) to provide to folks while we waited for official patches. He also wrote the first draft of a Sun Alert to inform customers of this problem.

By the time he went to bed, engineers in the UK were reviewing his code and the Sun Alert. Some of these engineers are part of my Security Engineering & Coordination team and were aware of the emergency procedures we've put in place for turning around response to 0-day (no advanced warning) security vulnerabilities in Sun's products.

By the end of the work day in the UK, the IDRs had been converted to ISRs (Interim Security Relief) which are the only things we ever make widely available on sunsolve.sun.com before running them through our normal test routines (which can eventually turn an IDR or ISR into a patch).

There were some minor issues during the day on Monday in the US that caused a bit of a delay in the ISRs and Sun Alert making it to the external servers, but by the end of the day in the US (a little more than 24 hours from the first report on a weekend) we had posted a Sun Alert and ISRs to fix the problem.

Even better, by the end of Tuesday, we had official patches released that closed the vulnerability.

Looking back -

Almost everything worked exactly as it should as we responded to this fire drill. We'd put in a number of processes to allow us to do quick releases of ISRs and Security Sun Alert, and everybody knew their part and did it well. Sun took great advantage of having engineers located around the globe, and work progressed throughout our 24 hour response without needing to keep folks up past their bedtimes.

Going forward -

We did learn a couple of things from this experience. There are some aspects of the final push to external servers that we can get faster on (though I'm happy to report that these fire drills are fairly infrequent around here) and we're working on those.

Another interesting bit is the number of people who were looking to blogs.sun.com/security as their primary source of information. We put that blog together to mirror the primary Sun Alert page (entries get posted after a Security Sun Alert comes out), but there's no reason we can't post drafts there ahead of the official release. In the future, we'll do just that in emergency situations. In this particular case that would have meant a draft of the Security Sun Alert posted sometime before most of the US got into work on Monday. It would have mentioned an immediate workaround (shut down telnet) and a pointer to the place on sunsolve where the ISRs would appear later in the day.

It's also worth noting that this is the first time the OpenSolaris community was the source of us finding out about a security vulnerability in Solaris. As nice as it makes my job when we have sufficient time to fix things before going public, I understand that in an Open world, we've got to be able to react to public postings, and I think we did a pretty good job this time.

pageicon Monday Jun 13, 2005

Why did that security patch take so long? part 2

Speed vs. Quality

I think this is one of the biggest decision points in producing patches. Do you put something out as quickly as possible, or do you test the heck out of it and make sure the chance of a bad patch is as low as you can possibly make it?

Long ago Sun made a decision that quality was the most important feature of our patches. As a result we have a pretty extensive and thorough process that code changes must go through before they ultimately wind up in a nice little patch for customers to put on their systems. As business decisions go, I think this is a pretty wise thing to do. Few things can aggravate a customer more than rolling a patch onto hundreds or thousands of systems only to find out that it broke something or you have to take them all back off. Plus, when you consider that many customers test patches anyway, or only have limited maintenance windows during which to install patches, adding an extra week or two before it rolls out of Sun’s doors really didn’t make all that much of a difference.

The problem is, making changes to operating system code can be dangerous. Sure, once you’ve identified a bug the fix for that particular issue may seem simple, but an OS is a big complicated thing. There are many interdependencies between libraries and protocols and the kernel, etc. Add to that Sun’s commitment to full backward compatibility and support for two (sparc & x86) hardware platforms and it just takes a while to test patches.

But, and this isn’t the only time I’ve had to say this, security is different. A customer who runs a normal patch through a 3 month testing cycle will put a security patch on today if it fixes a nasty vulnerability. Plus, many security vulnerabilities are really not all that difficult to fix, once you find them. Yes, it might take some extremely elegant coding or aligning of planets to make an attack work correctly, but in most cases (the recent hyperthreading issue is definitely an exception) once you’ve found the problem, fixing it is relatively straightforward, and most of the time there is little danger of the security fix breaking anything else. So, does it make sense to run a security patch through the same rigor as normal patches?

Add to that the fact that many security fixes are in common code (sendmail, kerberos, BIND, etc.) that many of the Unix/Linux flavors use. It’s pretty hard to explain why Sun would take a month to fix something the Linux vendors would fix in a matter of hours. Sure there were cases when the hour-to-develop fixes had problems, sometimes significant ones, but that alone wasn’t enough to stop the rising criticism directed at our long security turnaround times.

Enter ISRs (Interim Security Relief). When one of our software engineers makes a code change, they can bundle that change up in a tidy little package called an IDR (Interim Diagnostic and Relief). This IDR will install like a patch, is trackable through showrev, but must be removed before the files it modifies are modified again. It’s a really handy way of distributing either a diagnostic binary for gathering extra data on a system or a potential fix that we want to test in a customer’s environment. Basically, the software engineer simply takes their code changes and packages them up into an IDR. What a perfect middle ground for security fixes. Now, when we have a sudden security fix (aka public), our software engineers can fix it on the spot and produce a security version of an IDR. We can even make them available on "sunsolve":http://sunsolve.sun.com/pub-cgi/show.pl?target=security/tpatches&nav=tpatches .

There are a bunch of things to realize about these ISRs. They are only lightly tested. They aren’t an official patch, nor are they even a guarantee that a patch will come out to replace it. They have to be manually removed before you install a real patch (or even another ISR/IDR) over the same files. But, we can put one of these things out within hours of discovering a problem. In other words, and ISR is no different in quality and testing that what you’d get from an hour-to-develop fix for a Linux distribution.

What we’ve done is given customers a choice in the speed vs. quality decision for security fixes. If you want to wait for a fully tested patch (and yes, we’ll be working hard to get that out as quickly as possible) you can, but if you have to have a fast, lightly tested, fix on your machine immediately, you can do that as well.

I haven’t heard from a single customer who thinks ISRs are a bad thing.

pageicon Wednesday Jun 08, 2005

Why did that security patch take so long? part 1

I used to get asked all the time why it takes so long to put out a security patch. While it doesn’t happen all that often now (more on that in part 2), I thought I’d spill the beans on how we categorize and work on security bugs around here.

Probably the single most important classification for determining the process we use when rolling out security fixes is public vs. private. That is, is the vulnerability known about publicly, or is it (presumed to be) known only within Sun and perhaps by the friendly (aka willing to be quiet while we work on it) person/group that reported it to us? We consider a vulnerability publicly known if:

  • It’s been mentioned about in a chatroom, website, IRC, article, etc.
  • Somebody has posted an advisory.
  • There’s any evidence or indication that it has been exploited anywhere.
  • We know customers have been told about it.
  • We know that miscreants/black hats/unfriendlies/(pick your term) know about it.

When a vulnerability is publicly known, we will:

  • Post a Sun Alert immediately, even if we all we can tell customers is ‘there’s a vulnerability and the only workaround is to turn it off.’
  • Release patches as soon as they are ready (regardless of which version is patched first).
  • Release T-patches or IDRs (Interim Diagnostic & Relief) if they’re ready before patches.
  • If needed, and reasonable, waive some patch testing, soak time etc.

Not surprisingly, we’re happier when we know about things before they become publicly known. When a vulnerability is privately known, we’ll actually hold off on releasing any patches until we have patches ready for all supported versions of the impacted software (for Solaris this means patches available for Solaris 2.7,2.8,9,10 both sparc and x86). We’ll follow the same testing/soak time procedures we use for all patches, and the Sun Alert won’t go out until everything is ready. This is also true if we’re doing a coordinated release with either a security company, a group like CERT/CC or other vendors.

But wait a minute, doesn’t keeping this stuff quiet and not telling customers about it, put them at risk? Perhaps, but we don’t think so. We know the bad guys are always looking for new ways to break into computers and we try very hard not to make it easier for them. If we went public with information as soon as we had it, we would be pointing the bad guys at least in the general direction of where to start looking for a way in.

Of course, if at any time an issue that was classified as private goes public, all bets are off and we release what we have at that time and put out a Sun Alert.

The bottom line is it can take a lot of time to put out patches that have the level of quality Sun is used to delivering. If we believe we can safely take that time we will. Many of the people who report things to us understand this and are willing to wait, sometimes a couple of months, while we get everything ready.

In a couple of days I’ll write part 2 about the decisions and tradeoffs between speed vs. quality.

pageicon Tuesday May 31, 2005

Hyperthreading Vulnerability - Worrisome or Just Hype?

Every so often a security story comes along that seems to attract more attention in the media than would seem warranted. I’ve been spending time lately dealing with the recent discussions about a hyperthreading vulnerability which allows monitoring info (eventually crypto key info) when a CPU is swapping between users. I won’t go into the technical specifics here since you can find them in a number of places covered more thoroughly than I could in a blog.

Don’t get me wrong, I understand the security implications and I realize the vulnerability has been proven (instead of merely theoretically possible according to code analysis), but all of security is a trade off between risk and usability. In this case it seems highly unlikely in a real world setting that an attacker with local access to a server would be able to capture the necessary information in order to decrypt sensitive information. Add to this the potential performance impact of disabling the hardware feature, or the impractical solution of rewriting all crypto applications and it doesn’t seem reasonable to expect a hardware or software vendor to ruin usability to protect against this unlikely attack.

With that said, I do believe the vendors owe it to their customers to explain the risk and offer possible solutions in the event that a customer would prefer to engage the security at the cost of usability. In short, I think Intel got this one right when they pointed out that anyone who has sufficient access to a machine to be able to configure this attack is already in position to cause significant damage. We’re in the process of publishing a Sun Alert with this information.


« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

Feeds

Search this blog

Links

Weblog menu

Today's referrers

Today's Page Hits: 52