Wednesday December 10, 2008 restricting MySQL memory with Solaris resource controls and rcapd
Now that OpenSolaris 2008.11 is out, that means the next iteration from the OpenSolaris Web Stack project, a.k.a. Sun Web Stack, is available! I'll post another blog or two on this soon.
One of the underutilized features of Solaris (IMHO) is the resource management capabilities. Generally speaking, people know zones have resource controls but they may not be aware that the resource controls have been in Solaris for quite a while, since Solaris 9 updates. In other words, it's not required to use zones to use resource management.
Recently a customer encountered a memory leak bug with MySQL. We'll obviously fix that, but getting it into the patch cycle and tested will take a little while. In the interim, we needed a solution to keep MySQL from leaking so much that it affects the system. Enter Solaris Resource Management. With a bit of experimentation, as expected, I did find that I could effectively limit the max address space and limit the resident memory with both rcapd and generic resource controls.
Assuming the user mysql,which is the default in the version we ship in OpenSolaris Web Stack, here's what I set (I'm running OpenSolaris build 101, but expect similar behavior on S10U6):
# projadd user.mysql # projmod -s -K "rcap.max-rss=100MB" -K "process.max-address-space=(priv,100MB,deny)" user.mysql # rctladm -e syslog=WARNING process.max-address-space
This will limit the max-rss (resident memory, via rcapd, which needs to be enabled: svcadm enable rcap) and the process max address space to 100MB for the user mysql (by configuring the user's default project). In practical terms, prstat showed the memory usage a bit lower, probably due to how the accounting is done for shared objects.
The rctladm command will tell Solaris to syslog a warning (on a global basis) if the process tries to exceed the memory amount set out. Note that becuase I'm using deny on the max-address-space, it can't get any more memory anyway.... Solaris will just act like there's no more memory available when the process tries to allocate some. In this case, we may want to restart the MySQL service, through a cron job that looks to the syslog.... though we'd have to handle that carefully.
I experimented with mysql 5.0 and a small program to get mysql to try to use a lot of memory and show rcapd/resource controls doing their job. I ran rcapd in debug mode to verify it was doing the correct thing.
rcapd: collection types: 0x1 rcapd: vmusage sample flags 0x4 rcapd: getvmusage time: 170.14 milliseconds rcapd: kernel nres 4 rcapd: vmusage_sample rcapd: 0: id: 100, type: 0x4, rss_all: 108523520 (105980KB), swap: 2916417536 rcapd: 1: id: 3, type: 0x4, rss_all: 3014656 (2944KB), swap: 253952 rcapd: 2: id: 10, type: 0x4, rss_all: 1350504448 (1318852KB), swap: 1255268352 rcapd: 3: id: 0, type: 0x4, rss_all: 193441792 (188908KB), swap: 180629504 rcapd: project user.mysql rss/cap: 105980/102400, excess = 3580 kB rcapd: any collection/project over cap = 1, 1 rcapd: enforcing caps rcapd: project user.mysql scanner starting to scan, excess 3580k rcapd: project user.mysql scanner resuming process 29686 rcapd: process 29686: 4/0kB rfd/mdfd since last read rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: process 29686: 4/0kB rfd/mdfd since hand swept rcapd: process 29686: 2857044/0kB scannable rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: project user.mysql scanner trying to resume from 0x33634000, next 0x33634000 rcapd: project user.mysql scanner paging out process 29686 rcapd: project user.mysql scanner paged out 0x33634000+0t(468/3580)kB rcapd: project user.mysql scanner paged out 0x339b3000+0t(0/3112)kB rcapd: project user.mysql scanner paged out 0x33cbd000+0t(0/3112)kB rcapd: project user.mysql scanner paged out 0x33fc7000+0t(0/3112)kB rcapd: project user.mysql scanner paged out 0x342d1000+0t(120/3112)kB rcapd: project user.mysql scanner paged out 0x345db000+0t(0/2992)kB rcapd: project user.mysql scanner paged out 0x348c7000+0t(0/2992)kB rcapd: project user.mysql scanner paged out 0x34bb3000+0t(0/2992)kB rcapd: project user.mysql scanner paged out 0x34e9f000+0t(0/2992)kB rcapd: project user.mysql scanner paged out 0x3518b000+0t(0/2992)kB rcapd: project user.mysql scanner paged out 0x35477000+0t(4/2992)kB rcapd: project user.mysql scanner paged out 0x35763000+0t(0/2988)kB rcapd: project user.mysql scanner paged out 0x35a4e000+0t(844/2988)kB rcapd: project user.mysql scanner paged out 0x35d39000+0t(1732/2144)kB rcapd: project user.mysql scanner paged out 0x35f51000+0t(412/412)kB rcapd: project user.mysql scanner done, excess 0 rcapd: sleeping 0.71 seconds rcapd: updating statistics... rcapd: project user.mysql status: succeeded/attempted (k): 3580/42512, ineffective/scans/unenforced/samplings: 0/1/0/1, RSS min/max (k): 0/243872, cap 102400 kB, processes/thpt: 1/0, 1 scans over 907 ms rcapd: sleeping 3.81 seconds rcapd: collection types: 0x1 rcapd: vmusage sample flags 0x4 rcapd: getvmusage time: 3.34 microseconds rcapd: kernel nres 4 rcapd: vmusage_sample rcapd: 0: id: 100, type: 0x4, rss_all: 108523520 (105980KB), swap: 2916417536 rcapd: 1: id: 3, type: 0x4, rss_all: 3014656 (2944KB), swap: 253952 rcapd: 2: id: 10, type: 0x4, rss_all: 1350504448 (1318852KB), swap: 1255268352 rcapd: 3: id: 0, type: 0x4, rss_all: 193441792 (188908KB), swap: 180629504 rcapd: project user.mysql rss/cap: 105980/102400, excess = 3580 kB rcapd: any collection/project over cap = 1, 1 rcapd: enforcing caps rcapd: project user.mysql scanner starting to scan, excess 3580k rcapd: project user.mysql scanner resuming process 29686 rcapd: process 29686: 4/0kB rfd/mdfd since last read rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: process 29686: 4/0kB rfd/mdfd since hand swept rcapd: process 29686: 2857044/0kB scannable rcapd: identified nonpageable schedctl mapping at fea04000 rcapd: project user.mysql scanner trying to resume from 0x35fb8000, next 0x35fb8000 rcapd: project user.mysql scanner paging out process 29686 rcapd: project user.mysql scanner paged out 0x35fb8000+0t(3580/3580)kB rcapd: project user.mysql scanner done, excess 0 rcapd: sleeping 0.88 seconds rcapd: updating statistics... rcapd: project user.mysql status: succeeded/attempted (k): 3580/3580, ineffective/scans/unenforced/samplings: 0/1/0/1, RSS min/max (k): 0/243872, cap 102400 kB, processes/thpt: 1/0, 1 scans over 299 ms rcapd: sleeping 3.81 seconds
Setting the resident set and max memory to the same amount is probably not the right approach for most uses, it's just what I experimented with here. I'd think in most cases you'd set the max higher and set the resident set to something sane for the system. If you want to be sure the mysqld won't overflow to swap, you may want to actually set the max memory to something less than physical memory. Keep in mind, we were pretty coarse grained there by setting it up with the user mysql. You can use a project and the newtask(1) command instead if you want the resource controls to apply to particular processes owned by a user.
This is just a simple example of the kinds of things you can do. Have runaway processes or threads occasionally? You can catch them and kill them with resource controls. You can also use coreadm(1M) to be sure you're capturing the errant behavior to analyze and fix the issues in useful core files, not just droppings all over the filesystem. Have a look at resource_controls(5) and related documentation for details.
( Dec 10 2008, 12:52:44 AM PST ) PermalinkThe main reason this failure is cool is because it's not my box, I was just borrowing it. In the process, I was able to see how the Solaris FMA stuff works on Intel systems:
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 20 12:07:33 134d04c5-1961-ec12-bfb9-dd97668f42ca INTEL-8001-43 Critical
Fault class : fault.cpu.intel.nb.ie
Affects : hc:///motherboard=0
degraded but still in service
FRU : "MB" (hc:///component=motherboard)
Description : Northbridge has detected an internal error Refer to
http://sun.com/msg/INTEL-8001-43 for more information.
Response : System panic or reset by BIOS
Impact : System may be unexpectedly reset
Action : Replace motherboard
It'll remember this across reboots, report with a human readable message, etc.
( Jul 14 2008, 04:45:01 PM PDT ) PermalinkAs I type this, I've blown away my old Solaris partition on my laptop and I have OpenSolaris 2008.05 laying it's bits down. I'd hoped to finish this blog by the time it was done, but it seems to be going too fast (it's at 68%).
I'd wanted to do this some time back, but between the MySQL conference, CommunityOne, the hands-on-labs at JavaOne (for which we received a 4.3 out of 5 review!) and working with customers, there's been little time to get things planned out to install the system. I wanted a good backup or two as well, not so much that I need one since I keep all important data replicated to my NFS home directory at Sun, but just in case.
My old installs have been good to me, but I've not lived day-to-day with OpenSolaris yet. I've installed it on a few systems (like this one) and in VirtualBox a number of times. Still, running on the metal should give me the ability to update easily, minimize the install (even I don't use all of the packages!), move to ZFS root, etc.
(the install just finished)
I'll also be able to more easily work with things in our Web Stack Experimental Repository.
Well, it's all rebooted, so now I'm off to bring back some selected files from the backups.
( May 29 2008, 09:17:49 PM PDT ) Permalink Comments [1]DTrace visualization of MySQL: excellent tool for optimization
Neel has just posted an example of using DTrace with MySQL to examine what is going on in the system at runtime. Check out the SVG! This is very, very cool and can certainly show where we'll be able to take MySQL and other components we use in the Web Stack project.
And think, we're just hitting our stride. For users, DTrace for Apache HTTP Server has just been added to Web Stack, PHP is already there and Ruby is just around the corner. Very cool!
( Apr 02 2008, 06:50:16 PM PDT ) Permalink Comments [1]OpenSolaris Web Stack: Setting Services on Apache
Solaris Express Developer Edition 01/08 (a.k.a. SXDE 01/08) has been released! One of the features of the new release is the integration of much of the work from the Web Stack project
So you're probably wondering, what is different or better about the Web Stack (including, but not limited to Apache, MySQL and PHP) integrated into OpenSolaris. Well, the code is the same stuff available at the various project sites. This, of course, is by design. Though it is Open Source of one license or another, there's no real value in deviating from what the upstream communities have released.
Still, there are some interesting places OpenSolaris can add value. OpenSolaris has things like DTrace, SMF, lots of security features, etc.
Jyri has already written about how easy it is to run PHP on OpenSolaris, including SXDE 01/08. Assume for a moment though that what you want is a 64-bit Apache with mod_proxy. Well, how do you set that up?
The team has done a great job of integrating the web stack components. One example is using svccfg to make changes to how a service is set up to run. So, let's look at modifying the service configuration of Apache. First, we start it:
# svcs http:apache22 STATE STIME FMRI disabled 19:35:01 svc:/network/http:apache22 # svcadm enable http:apache22
The process is up and running, but is it 32 or 64 bit? Hmmm. Let's find out. First we need to know what the process is. We could grep for http or something like that, but there's a better way to see which processes are associated with a service:
# svcs -p http:apache22
STATE STIME FMRI
online 19:54:30 svc:/network/http:apache22
19:54:30 1154 httpd
19:54:31 1155 httpd
Okay, now we see the processes. How do I find out if it's 32 or 64 bit? pargs(1) can tell tell us what the binary is, then we can look at the file:
# pargs -x 1154 1154: /usr/apache2/2.2/bin/httpd -k start AT_SUN_PLATFORM 0x08047fda i86pc AT_SUN_EXECNAME 0x08047fe0 /usr/apache2/2.2/bin/httpd AT_PHDR 0x08050034 AT_PHENT 0x00000020 AT_PHNUM 0x00000007 AT_ENTRY 0x0806cc10 AT_SUN_LDDATA 0xfeffa000 AT_BASE 0xfefc0000 AT_FLAGS 0x00000000 AT_PAGESZ 0x00001000 AT_SUN_AUXFLAGS 0x00000002 AT_SUN_HWCAP 0x0043dc6f SSSE3 | AHF | CX16 | MON | SSE3 | SSE2 | SSE | FXSR | MMX | CMOV | SEP | CX8 | TSC | FPU # file /usr/apache2/2.2/bin/httpd /usr/apache2/2.2/bin/httpd: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, stripped
Okay, it's just 32-bit. Since it's so nicely integrated, perhaps there is a service property to switch it. Let's see what service properties we have to mess with:
# svcprop http:apache22 httpd/enable_64bit boolean false httpd/server_type astring prefork httpd/stability astring Evolving httpd/startup_options astring "" httpd/value_authorization astring solaris.smf.value.http/apache22 network/entities fmri svc:/milestone/network:default network/grouping astring require_all network/restart_on astring error network/type astring service filesystem-local/entities fmri svc:/system/filesystem/local:default filesystem-local/grouping astring require_all filesystem-local/restart_on astring none filesystem-local/type astring service autofs/entities fmri svc:/system/filesystem/autofs:default autofs/grouping astring optional_all autofs/restart_on astring error autofs/type astring service startd/ignore_error astring core,signal general/action_authorization astring solaris.smf.manage.http/apache22 general/enabled boolean false general/value_authorization astring solaris.smf.value.http/apache22 general/entity_stability astring Evolving start/exec astring /lib/svc/method/http-apache22\ start start/timeout_seconds count 60 start/type astring method stop/exec astring /lib/svc/method/http-apache22\ stop stop/timeout_seconds count 60 stop/type astring method refresh/exec astring /lib/svc/method/http-apache22\ refresh refresh/timeout_seconds count 60 refresh/type astring method tm_common_name/C ustring Apache\ 2.2\ HTTP\ server tm_man_httpd/manpath astring /usr/apache2/2.2/man tm_man_httpd/section astring 8 tm_man_httpd/title astring httpd tm_doc_apache_org/name astring apache.org tm_doc_apache_org/uri astring http://httpd.apache.org restarter/logfile astring /var/svc/log/network-http:apache22.log restarter/contract count 95 restarter/start_pid count 1139 restarter/start_method_timestamp time 1202270070.318463000 restarter/start_method_waitstatus integer 0 restarter/auxiliary_state astring none restarter/next_state astring none restarter/state astring online restarter/state_timestamp time 1202270070.320630000
There's a lot of stuff because of the various dependencies and other variables that SMF needs for some of the other magic. That one labeled "httpd/enable_64bit" looks like the one we need though. Let's change it using svccfg:
# svccfg -s http:apache22 svc:/network/http:apache22> listprop httpd/* httpd/server_type astring prefork httpd/stability astring Evolving httpd/startup_options astring httpd/value_authorization astring solaris.smf.value.http/apache22 httpd/enable_64bit boolean false svc:/network/http:apache22> setprop httpd/enable_64bit=true svc:/network/http:apache22> listprop httpd/* httpd/server_type astring prefork httpd/stability astring Evolving httpd/startup_options astring httpd/value_authorization astring solaris.smf.value.http/apache22 httpd/enable_64bit boolean true svc:/network/http:apache22> exit
Now, when we change these service properties, we need to apply the changes by refreshing the service and then restarting it. Then we can see if we got the desired effect:
# svcadm refresh http:apache22
# svcadm restart http:apache22
# svcs -p http:apache22
STATE STIME FMRI
online 20:44:40 svc:/network/http:apache22
20:44:40 1683 httpd
20:44:41 1684 httpd
20:44:41 1685 httpd
20:44:41 1686 httpd
20:44:41 1687 httpd
20:44:41 1688 httpd
20:44:41 1689 httpd
# pargs -x 1683
1683: /usr/apache2/2.2/bin/amd64/httpd -D 64bit -k start
AT_SUN_PLATFORM 0xfffffd7fffdfffcf i86pc
AT_SUN_EXECNAME 0xfffffd7fffdfffd5 /usr/apache2/2.2/bin/amd64/httpd
AT_PHDR 0x0000000000400040
AT_PHENT 0x0000000000000038
AT_PHNUM 0x0000000000000008
AT_ENTRY 0x000000000042d450
AT_SUN_LDDATA 0xfffffd7fff3fa000
AT_BASE 0xfffffd7fff394000
AT_FLAGS 0x0000000000000000
AT_PAGESZ 0x0000000000001000
AT_SUN_AUXFLAGS 0x0000000000000002
AT_SUN_HWCAP 0x000000000041dc77 SSSE3 | CX16 | MON | SSE3 | SSE2 | SSE | FXSR
| MMX | CMOV | AMD_SYSC | CX8 | TSC | FPU
# file /usr/apache2/2.2/bin/amd64/httpd
/usr/apache2/2.2/bin/amd64/httpd: ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR CMOV FPU], dynamically linked, stripped
Looks good. Apache, as you may know, has both prefork and worker models. I happened to spot in there that there is a property for that. Since all of the modules I'm using are thread safe, I think I'll turn it to worker...
# svccfg -s http:apache22
svc:/network/http:apache22> listprop httpd/*
httpd/server_type astring prefork
httpd/stability astring Evolving
httpd/startup_options astring
httpd/value_authorization astring solaris.smf.value.http/apache22
httpd/enable_64bit boolean true
svc:/network/http:apache22> setprop httpd/server_type=worker
svc:/network/http:apache22> exit
# svcadm refresh http:apache22
# svcadm restart http
svcadm: Pattern 'http' matches multiple instances:
svc:/network/http:squid
svc:/network/http:apache22
# svcadm restart http:apache22
Very nice, now I can go edit my httpd.conf and enable proxying!
I went from 32 to 64 bit, and changed MPMs all without having to go rebuild Apache, edit rc files, etc. Plus, the current configuration is always easy to find with svcprop.
Once other projects, like Visual Panels, come along we should then see a nice GUI way of understanding and editing our services. The other nice advantage is if I have a large number of systems I want to configure the service on in a similar way, SMF makes it simple to replicate that service configuration to other systems. Alternatively, I could just jumpstart them and apply a service config.
You may be asking, but what about that stuff in httpd.conf? Well, remember our goal here in the Web Stack project is to integrate nicely with those upstream projects. Admins know how to edit their httpd.conf files, and it wouldn't add much value to start making that many modifications to the Apache HTTP Server. The philosophy is integrate with the features in OpenSolaris, but stay true to the community that produced the code.
Download and play with SXDE 01/08 or one of the Project Indiana builds after it synchs up with the latest code, or join us over in the OpenSolaris Web Stack project.
( Feb 05 2008, 11:42:39 PM PST ) Permalink Comments [3]Opening .docx files with OpenOffice on Solaris
For the first time I've encountered one of the Microsoft Office 2007 (sometimes called Office Open XML, though open-ness is in dispute in various places) files. I suspect I and many others will probably soon encounter these for the first time since a bevy of new Vista PCs with MS Office demo software installed were sold over Christmas.
The answer was not easily found. The semi-official answer is on the OpenOffice wiki. Since I use StarOffice on Solaris and don't encounter these things often, I decided to use the online document conversion. Zamzar, the top result from the google search attached to the link in the wiki there did a fine job. I'm certain I could get the Novell converter to run on Solaris BrandZ but it would be a major hassle and looking at the web pages, it may not even be necessarily portable between different Linux releases.
Ideally there'd be a simple converter that I wouldn't need to deal with advertisements for. Perhaps one of those other OO.o projects with some of the not-yet-ready-for-primetime converters could set up a small site?
Oh yeah, where did the file come from? It was my girlfriend's cousin's daughter's from school. Disappointing since OpenOffice is free and properly Open...
( Jan 10 2008, 10:55:53 PM PST ) Permalink Comments [2]Lots of good news for Solaris web stack runtimes
The big news on the web stacks for Solaris front is we've released Cool Stack 1.2!*
Both binaries and source are over here. Basant and Shanti did a great job getting things generally updated and a few key enhancements (like some SMF and DTrace integration). Also, all but one of the binaries are compiled with Studio 12 and are giving us great out of the box performance on x86, x64 and UltraSPARC systems like the new CoolThreads T2 systems. That should work well if, say you wanted to consolidate a bunch of old AMP boxes to a nice efficient 64-thread system.
Give 'em a download and a try. If you're a Cool Stack 1.1 user, there are some guidelines on how to migrate on the site.
Over on the Web Stack project on OpenSolaris.org, Jyri, Aravind, Rahul, Ritu, cvr, Ludo, Prashant, Luojia, and Sriram from Sun, with some great community input have been working through the tough work to integrate many of these same common web stack components into the OpenSolaris SFW consolidation. That way, they'll be there out of the box in distributions like Sun's Solaris Express Developer/Community Editions. Components are rolling in.... Squid is in, Apache and a number of modules are in and updated, PHP is updated, Ruby is very close, MySQL and memcached are coming. Right around build 79 of Nevada, there should be a pretty complete stack in there!
The other thing that has been coming together is the developer experience. As some of you may know, NetBeans is getting PHP and Ruby capabilities, so it's only natural that Solaris Express Developer Edition has a nice out of box experience for end users. It's not quite integrated yet so there's nothing to play with, but you can get a good idea of where it's going if you grab a daily build of netbeans and the most recent Solaris Express Community Edition. There's a bit of wiring for now that won't be there when things come together shortly, but it's not too bad.
As always, if you have any questions or want to register a vote (or contribute!) your favorite extension or module, join us over here. Thanks also to all those who have sent me feedback directly or through the lists/forums.
* For those that may have followed things, this is the same as the thing we were going to call 1.1.1. I suggested to the team that we call it 1.2 after I noticed the big changes in packaging. It just didn't seem right for a micro release to be delivering such changes. Basant and Shanti had already had similar thoughts, so we changed it.
( Nov 03 2007, 12:03:48 AM PDT ) PermalinkUpdated Cool Stack 1.1.1 deliverables in Web Stack wiki
When I first posted the Cool Stack 1.1.1 deliverables on the Web Stack wiki, I knew it wasn't quite complete. Thanks to some info and updates from Basant Kukreja (link not supplied, as he is a non-blogging heathen, as John Clingan would say), who has dutifully been working on the Cool Stack 1.1.1 updates, I've been able to update it.
We'd love any feedback. We've gotten some on the Cool Stack forums already. Please have a look over here and post any feedback to either the Web Stack mailing list or the Cool Stack forum.
( Oct 12 2007, 11:44:49 AM PDT ) Permalink Comments [3]A Complete Open Source Stack: Hardware to Web 2.0
I'm a bit delayed in posting the slides, but not so long ago I gave a talk at both the UUASC and the local IEEE chapter CompSoc meeting.
The talk had originally been assembled for SCALE5x. Despite the "L" in the name, SCALE is an Open Source conference, so demonstrating Sun's involvement in Open Source by showing various technological bits (OpenSPARC, OpenSolaris zones/DTrace, PostgreSQL, OpenJDK, NetBeans and Glassfish, all working together).
Both were well received. I think a lot of it was new to the IEEE folks, but many of the UUASC people had seen some of the bits and pieces in full presentations over the years.
Presentation is here.
( Aug 02 2007, 08:13:29 PM PDT ) Permalink Comments [2]User Interface muscle memory is a bit of a problem. What do I mean by this? Your average OS/400 user is used to context-sensitive help being available through the "F4" key. Your average Windows user is used to it being available through "F1". Your average Sun Workstation user is used to it being available through a key labeled "Help". :)
There are, of course, all kinds of variations on this. Windows 2000 - XP - Aero. OpenLook - CDE - Gnome - KDE. sh - ksh - csh - bash. I won't even bring up editors.
Don MacAskill touched on this in, as it relates to OpenSolaris, in his blog on The Enterprise Linux Problem. I admit there are differences, but the solutions are anything but obvious. Whatever works for one person, may cut off or alienate things others are looking for or have grown to expect.
The good news, at least in OpenSolaris land, is that the community (and Sun) have bridged a lot of the simple gaps over the years (i.e. add /usr/sfw/bin to your path if it's not already there!) and other projects like the OpenSolaris GNU Communities /usr/gnu project should help to address some of the UI muscle memory problems. It definitely won't address everything, but the it's not a strict engineering problem-- Sun cannot address this in a vaccum and you can't do a marketing study to get the right answer. If there are UI differences you'd like to see, then please join (or at least rant to) us over at opensolaris.org
Solaris + AMP/Cool Stack proposed to move to OpenSolaris
The proposal has just come out over on OpenSolaris discuss, but the good news is Cool Stack, the basis for Sun's Solaris + AMP offering (more commonly known as SAMP), is moving to a new home.
There was nothing wrong with the existing community/site/project, but it's clear that the project there has a lot more in common with things going on over at OpenSolaris. I hope it'll lead to the consumers and users working together to build up the best practices. I have some experience with Cool Stack already from customer projects, and it's definitely been a good starting point for these projects, but it has grown beyond the CoolThreads platforms and there's room for some of the other users of this software and packaging projects to join in.
You'll find the official announcement over here, and I'm sure the project will be set up soon on OpenSolaris.org.
ZFS reliability to the rescue of the Centers for Disease Control
I've recently been working with a customer as part of my new job (which I guess I should blog about someday) on how ZFS can work in their environment. In the process, I got a bit more in depth with zfs than I'd had opportunity to before, and had joined the OpenSolaris zfs-discuss mailing list.
So this evening, when going through the days email, reading and deleting, I came across this posting.
Apparently, the CDC had started to use ZFS in production and through it's normal use, ZFS found that their SAN equipment was doing exactly what ZFS states as the filesystem problem: every layer (erroneously) trusts every other layer, regardless of which way the data is flowing.
I don't even remember the name of the offering, but I do recall from some years ago Oracle and EMC (I think?) having an offering that allowed Oracle to checksum the data written to the tablespaces. It was insanely expensive and really only appropriate for the most mission critical environments as a result.
Fast forward several years, and now with zfs, I have the same level of reliability with the junk in my home directory on my laptop. To quote the post: "Another win for ZFS"...
Technorati Tags: ZFS, OpenSolaris
( Nov 27 2006, 10:41:00 PM PST ) PermalinkI didn't make it to Java One last year, so I never did get to see Aerith before it was released on java.net.
Anyway, as you all know from my previous entry on XOrg with my Tecra M3, I have decent 3D hardware. Since Joshua Marinacci and Richard Bair are coming out next month, I figured I should at least check it out.
The getting started page pretty much covered it all, but I did also have to download Javazoom's JLayer. I also had to create a Flicker account, and register to use their APIs. Then, after creating a public album, Aerith started working just fine!
So, I can report Solaris Nevada, build 32, current NVidia driver as of this writing, and Aerith work nicely.
( Aug 16 2006, 10:04:12 PM PDT ) Permalink
XOrg with nVidia Geforce Go 6600TE/6200TE Not so long ago, my old laptop died, a Tecra M2, and I'd obtained a Tecra M3 to replace it with. Loading Solaris was no problem, I've been running build 32 for some time with wifi, inetmenu to configure it. I'd been using a ... roundabout method to work with projectors when showing things to customers, but I finally sat down to set up TwinView recently.
The full xorg.conf can be found here, but the pertinent part under the "Screen" section is:
Option "TwinView" "True"
Option "TwinViewOrientation" "Clone"
Option "SecondMonitorHorizSync" "30-70"
Option "SecondMonitorVertRefresh" "56-85"
Option "UseEdidFreqs" "True"
Option "MetaModes" "1280x1024, 1280x1024; 1024x768, 1024x768; 800x600, 800x600"
Option "ConnectedMonitor" "DFP-0, CRT-0"
I've not gotten all the resolutions to work, but the 1024x768 res seems to be fine with every projector I've encountered. That's good enough for slide shows, but I'd like to get it set up with 1280x1024 for software demos.
Other notes, this is with the latest nVidia driver, 1.0-8762 as of this writing, and with the BIOS setting for the flat panel to send display to both CRT and DFP.
( Jul 26 2006, 12:00:53 PM PDT )
Permalink
Comments [3]
X2100 SMDC best practices for serial over lan with ipmitool
Well, John, you beat me to the punch. I had a great blog entry hanging out in my grey matter on using Solaris 10 with the SMDC. Still, there are a few nuggets I can relay.
First, the prereqs. This all assumes:
The release notes, are mostly complete, but have a couple of errors and don't necessarily explain the "why" in some areas. I've already asked for some fixes against the release notes. Blogs are faster though. :) Just don't expect it to be a maintained doc. Check that link for the latest. The revision of the doc as of this writing is -15.
Now the errors. The release notes tell you there are edits required in the GRUB menu.lst. It turns out, you don't need to make any edits to GRUB. They tell you to set GRUB to use the serial device. That's not necessary. So the lines:
serial --unit=0 --speed=9600 terminal serial
The release notes also tell you to edit the multiboot line as follows:
kernel /platform/i86pc/multiboot -B console=ttya
Finally, this one is a bit of a major error, you need to use bootadm to determine where the menu.lst is. You cannot necessarily rely on it being in the directory mentioned. It's not as major an error if you read the comments in the file-- but I find not everyone reads documentation. :) Besides, if you take my approach, you don't end up changing it at all anyway.
The other thing required at this point is to edit the asy.conf. This is due to a BIOS bug on the X2100. If you have the BIOS in "redirect over serial" mode, it puts the real, onboard serial device entry in the ACPI table. If you put it in "redirect over SMDC" mode, it does not put the entry for the serial device in the ACPI table, so you need to tickle the asy driver to use the ISA device. This will change in OpenSolaris shortly (if not already) and will be in Solaris 10U3, it's being tracked as Bug ID 6416708. Other platforms have similar issues, so the driver is being modified to look for the common serial ports. The ISA address and interrupts of the serial ports haven't changed for many, many years, so this probably makes sense. I'm told this affects the IBM BladeCenter and Dell PowerEdge 1855MC system's virtual serial consoles (though, those use ttyb).
The downside is that until this fix is available, without a specially built boot archive or modified jumpstart, you won't be able to perform an install over the SMDC. So, until this fix hits, install over the serial or with a BIOS console (keyboard/mouse/video), then make the change to the asy.conf.
Finally, I believe (but don't know for certain) you need to run bootadm update-archive to be sure you'll have a nice fresh archive for your next boot. Solaris will do this on shutdown, but I figure it's best to do it now, just in case you don't have a clean shutdown.
After that, you'll have a nice, working impitool serial over lan. A command similar to:
ipmitool -H-U Admin tsol
There is one lingering issue though. The bug mentioned in the release notes about the bge driver had been fixed, but there seems to continue to be a bge problem. I found that after using the tsol for a little while, the whole SMDC would disappear unless I excluded drv/bge. So, add the following to /etc/system before you reboot:
exclude: drv/bge
That will elimininate sideband usage of the bge from Solaris, but you'll have a stable SMDC.