Tuesday Jun 28, 2005
Tuesday Jun 28, 2005
So it does appear that register windows do offer some protection but I've never managed to demonstrate this with simple overflow code. If anybody has an example to back me up I'd be very interested to try it.I had a few comments on the entry which spurred me on to writing an example which shows that overwriting the return value on the stack in memory isn't always successful on SPARC. Here it is:
/* * Copyright 2005 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ /* * Example code showing how register windows on SPARC partially save * you from buffer overflows. This is *no* defence as this is a * contrived example*/ #include <stdio.h> #include <dlfcn.h> /* * The SPARC stack layout means you can only overflow into the stack * above (ie the caller's stack) - in this case bar(), which will have * a return address into main(). * * By compiling this and checking the assembler it was found that the * local a[] array was 0x14 bytes below the frame pointer. The return * address is 15 words above that. See /usr/include/sys/frame.h for * details. * * The line where we zero the return address (a[i] = 0) should by all * accounts cause the program to SEGV on exit. In fact it doesn't as * the return address is still stored in a register window. * * If we want it to SEGV we have to make some function calls from * within foo() to 'push' bar()'s frame out of the register window and * have it spill onto the stack. Once we've done that then it won't be * filled back into the register window until we call restore at the * end of this function. * * To see this in action, simply uncomment the printf() before the * return address is zeroed. There is sufficient function depth in * the call to printf() to spill bar()'s frame. */ void foo(int x, int y, int z) { Dl_info dli; /* Used to extract symbol names */ int a[1], i; /* printf("hello, world\n"); */ i = 20; /* 0x14 (5 words) to the %fp + 15 words */ a[i] = 0; /* Zero the return address */ printf("Contents of return address in memory are:\n"); if (dladdr((void *)a[i], &dli) <= 0) { printf("%lx\n", a[i]); } else { printf("%s + 0x%lx\n", dli.dli_sname, (ulong_t)a[i] - (ulong_t)dli.dli_saddr); } } /* * As explained above, we add an extra function call level to make it * clearer that the return address is into our program (ie main()). */ void bar() { /* * Add some function arguments that are easy to spot if you * fancy digging around in the stack. */ foo(0x1234, 0xcafebabe, 0x1234); } int main(int argc, char **argv) { bar(); return (0); }
Not the most elegant pieces of code but it's cstyle clean ... in other words, passes Sun's coding style requirements
The main point is that it demonstrates that simply overwriting the return value in memory isn't necessarily sufficient. The register window must have been spilled at some point before this can succeed.
Monday Jun 27, 2005
What set this off was a lunchtime discussion on how buffer overflows can affect processors with register windows. If you don't know what I'm talking about then check the Wikipedia article. Fascinating ... if you're technically minded.
The thing with register windows is that the return pointer you want to overflow may not be in memory. You have to overflow spilled register sets. I was struggling to find a reference to this and most people I've asked just give me a blank look. For sometime now I've been wondering if I've been missing something obvious.
Thankfully, Google came to the rescue and found a paper all about this. It contains this quote:
As long as register windows are available, it is not possible for an overflow to overwrite the function's return address or frame pointer as they will still be contained in registers. However when the oldest window is saved to the stack, they are again vulnerable to overwriting.Apologies to the authors if I should not have quoted this article, I couldn't find any distribution or copyright notices
The paper discusses the state of the 'art'. That above quote came from a discussion on StackGhost which attempts to validate return addresses when filling the register window.
So it does appear that register windows do offer some protection but I've never managed to demonstrate this with simple overflow code. If anybody has an example to back me up I'd be very interested to try it.
Friday Jun 24, 2005
We had the first UK OpenSolaris User Group (OSUG) meeting on Monday.
The meeting was remarkable in that everyone presented OpenSolaris consistently and exactly the way I'd like it to be seen. There is clearly some distrust of Sun's motivation but pleasing all of the people all of the time is not worth attempting as we know. The community needs a simple way to communicate so setting up a UK OpenSolaris forum seems essential.
Good things
Not so good things

I had to dash off to attend a needy family (check out my Family blog entries) but I gather the discussion continued in the pub afterwards which was probably at least as effective.
I definitely plan to be at the next one. The UK engineers I work with are very keen to share their knowledge and participate but we'd rather people asked than have us presume. Please let us know what you need.
Technorati Tag: OpenSolaris
The general advice is, as you might expect, obvious: spend lots of time with them and make sure they get outside and play.
I find that my technical interests have mostly to be confined to normal office hours which isn't so bad as I have a very technical job. Urgent out-of-hours stuff gets done when it must after Lara and children have gone to bed. This will most probably change as the children get older, but I see this being the case for the next 4 years or so.
Some things I've very much learnt to appreciate are:
One current joy is carrying the older boy on my shoulders to his nursery which is a pleasant 10 minutes walk. It's hard to take the blank faced commuters seriously when there's a happy tousle-haired urchin jumping around pretending to throw snowballs at everything.
I wonder how Bryan is getting on
Tuesday Jun 14, 2005
Synopsis: UNIX users can belong to UNIX groups and for many years the maximum number of groups in Solaris has been limited to 16. Increasing it sounds easy and of obvious benefit. It turns out to be neither, read on.
Start with bug 1 4088757. I wrote in an internal-only section of that bug:
The bug has so many customers on it I'm surprised this hasn't been addressed before. It's also featured heavily on internal and external mail aliases. This makes me think that simply increasing the group limit isn't the answer.
...
Why are customers putting users in an excessive number of groups? The oft suggested fix of using ACLs clearly isn't meeting their needs or being communicated well.
...
I favour understanding what the customer is trying to achieve - I don't believe UNIX groups are particularly useful in today's networked, multi-platform IT infrastructures ... but then I don't get out much.
Of course, using ACLs to control file access isn't particularly cross-platform either but you get the idea. I'm trying to understand why people want to solve problems using large UNIX group membership so that we can design operating system features that meet that need.
All well and good, until Samba started integrating with Windows Active Directory and dealing with huge group memberships. As Samba has to map them on to the underlying OS it relies on the group membership offered. Result? We have a problem we need to fix, especially as the Linux 2.6 kernel now allows 65536 groups.
It isn't easy. Too much stuff would/could break.
The most obvious breakage is NFS. Strictly speaking it's not NFS that's at fault, it's more a victim. The underlying problem is a limitation in an authentication flavour commonly used - AUTH_SYS - and is pretty much the default. From RFC 1057:
9.2 UNIX Authentication
The client may wish to identify itself as it is identified on a
UNIX(tm) system. The value of the credential's discriminant of an
RPC call message is "AUTH_UNIX". The bytes of the credential's
opaque body encode the the following structure:
struct auth_unix {
unsigned int stamp;
string machinename<255>;
unsigned int uid;
unsigned int gid;
unsigned int gids<16>;
};
In other words, the list of supplementary groups is a variable sized array of up to 16 entries. You simply cannot have more than 16 groups and use AUTH_SYS.
Of course, NFSv4 isn't affected by this as there are plenty of other authentication flavours that are mandatory for clients and servers which are not affected by the group limits.
If you've been paying attention you might be given to wonder how the Linux 2.6 kernel handles this. Answer? It doesn't, it just truncates the group list at NFS_NGROUPS (16).
Up until Solaris 10 our credentials structure was public and anybody
could tinker with it. Then Casper introduced Least Privilege which had to make struct cred private and placed an API between kernel routines using creds and the cred structure itself.
For reference (but not use!) here is the private credential structure:
struct cred {
uint_t cr_ref; /* reference count */
uid_t cr_uid; /* effective user id */
gid_t cr_gid; /* effective group id */
uid_t cr_ruid; /* real user id */
gid_t cr_rgid; /* real group id */
uid_t cr_suid; /* "saved" user id (from exec) */
gid_t cr_sgid; /* "saved" group id (from exec) */
uint_t cr_ngroups; /* number of groups returned by */
/* crgroups() */
cred_priv_t cr_priv; /* privileges */
projid_t cr_projid; /* project */
struct zone *cr_zone; /* pointer to per-zone structure */
gid_t cr_groups[1]; /* cr_groups size not fixed */
/* audit info is defined dynamically */
/* and valid only when audit enabled */
/* auditinfo_addr_t cr_auinfo; audit info */
};
Hurrah - it's looking more fixable now. Anyone tinkering with the cred structure directly would have had to fix their code for Solaris 10. In addition, it may even be possible to back port the change to Solaris 10.
The group list is an array and the maximum size is controlled by ngroups_max which itself is limited as follows:
/* * These define the maximum and minimum allowable values of the * configurable parameter NGROUPS_MAX. */ #define NGROUPS_UMIN 0 #define NGROUPS_UMAX 32 /* * NGROUPS_MAX_DEFAULT: *MUST* match NGROUPS_MAX value in limits.h. * Remember that the NFS protocol must rev. before this can be increased */ #define NGROUPS_MAX_DEFAULT 16
Err, not quite. Not if we want 10,000+ groups. Let's see ...
Memory
A number of Solaris components (user and kernel) allocate structures related to ngroup_max size.
On my local Sun Ray the cred_cache 2 is currently 248 KB (1426 allocations of 148 bytes (sizeof (cred_t) [88] + sizeof (gid_t) [4] * (ngroups_max - 1)) + a small overhead). If we increased ngroup_max to the current Linux limit (65536) this would be in excess of 350 MB (1426 * (88 + 4 * 65535)). Having said that, this machine has 64 GB of memory
![]()
The main point is that a common kernel structure could increase in size from 148 bytes to potentially 256 KB.
Other subsystems outside of the cred_cache that do this include procfs.
3rd party kernel modules may be affected.
Performance
Checking group access in the kernel uses groupmember() which currently scans the group list in a simple loop. Large group membership might impact performance without changes in the searching of group membership. An obvious resolution would be to keep the list sorted and use a binary search to reduce lookup time.
3rd party kernel modules may be affected.
NFS
As discussed.
3rd party code
- 3rd party user code
Some poorly written user programs might make assumptions about the size of the group list. A simple interposer library using LD_PRELOAD could be used to fix this. AFAIK nothing should make assumptions about group size and they should always check the ABI.
- 3rd party kernel modules
A complete unknown.
This kind of change could break many things so we have an internal architectural review group that discuss this sort of thing. Here's what I had in mind for them, I just need to shape it up:
We are not proposing changing the default ngroups_max value of 16. This would break AUTH_SYS. We would propose adding comments in /etc/system and our documentation explaining how to increase group membership.
Internally this will be implemented by changing the credential structure so that the list of groups is a pointer to a separate kmem_alloc(). This means that the base cred structure is a (small) fixed size but group list can be variable.
A single kernel allocation could have been made for the whole variable sized cred structure but this would have meant dropping the use of a dedicated kernel memory cache. The observability and potential performance gain from using a dedicated cache is desired.
Update crdup() (etc) to handle the new cred structure. This will also include rewriting groupmember() to efficiently search larger group lists.
Update ucred_t. It's private (hurrah!) but also includes the group membership list in the structure so we'd need to change that and lots of bits of procfs too.
Hang on, what's that about ucred_t? Ah, yes - not so much an exercise for the reader, more an exercise for me. Never a dull moment.
1 Bug or Request For Enhancement (RFE)? There's a long standing internal aphorism related to fixing things in the current release: You can't escalate an RFE. This is broadly true as we like to focus our development on the next release be that an update or a whole new version. It's usually blindingly obvious what is a bug and what is an RFE except for a very small number borderline cases ... and I work in sustaining where those cases tend to cluster. In other words, I see a lot of them. There's a danger of individual groups toggling the bug/RFE status to suit their needs and arguing about it. My take is that if you're arguing about whether it's a bug or an RFE you're trying to answer the wrong question - a better question is just what is it that needs changing and why?
2 How can you find that? Easy ...
> ::kmastat ! head -3 cache buf buf buf memory alloc alloc name size in use total in use succeed fail ------------------------- ------ ------ ------ --------- --------- ----- > ::kmastat ! grep cred_cache cred_cache 148 1356 1426 253952 601703 0 >
Getting the 3 top lines and the line matching cred_cache in a concise but not obfuscated single line is left as an exercise for the reader.
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
Technorati Tag: Samba Note that Samba is a file system technology and a racy Latin-American dance. You may find some unexpected pictures on the Technorati Samba tag page from Flickr.
Monday Jun 13, 2005
JP's not the most public person so I was surprised to see his first ever blog entry looking back at the history of Apple, Macs and why software, not hardware, is the key. He writes:
I can't be sure why Apple chose to switch - whether it is tied in with Movie Stores, or IBM complacency, or even a bid to beat Microsoft. Perhaps it is really a complex and subtle mix... to be sure it is a tough call. I do remember that one of the important rules in business is to make a decision - even if it is the wrong decision. Jobs could have stuck with IBM... only to later regret it. Whatever, I am pretty sure Jobs has picked the only possible time to make this switch - Apple has public recognition, a respected OS, money in the bank and a small but strong installed base. It'd be harder to do at any other time.
From a processor snobbery point of view I personally love RISC chips with their orthogonal, copious and windowed registers. I find the x86 instruction set clumsy and hard to navigate. It's all very, well, 1980s. But then it still runs code from then too. Hold on, isn't upwards binary compatibility one of the (many) excellent features of Solaris? - perhaps I shouldn't be too dismissive of the same feature in processors.
As for the Apple-Intel deal - very interesting. I'm looking forward to running Solaris on the new boxes.
Technorati Tag: Apple
Technorati Tag: Mac