hit counter
   
 

Random ramblings of a paranoid git
"The question is not if you are paranoid, it is if you are paranoid enough."


All | Security | Work | Wine & Dine | Leisure

   
   
20091106 Friday November 06, 2009
Behavior Driven Infrastructure
Permalink | Comments [4] | 2009-11-06 04:37

One problem I'm wrestling in my day job at Web Engineering is: how do you know when a system you are building is ready?

When we build a new system, it goes through the following steps:

  1. Jumpstart
    Installs the OS and sets up basic configuration, like hostname, domainname, network.
  2. Puppet
    System specific configuration
  3. Manual steps
    This includes things which are too system dependent to automate, like creating a separate zpool for application data on external storage

For me it has been enough to review the puppet logs to determine if the system has been correctly configured, but for my colleagues who aren't using puppet on a daily basis, it isn't. They have been asking "how do we know if a system is ready?", and I've realized that "review the puppet logs" isn't really a helpful answer for most people. What if you have forgotten to add a node definition for the system, and you get the default node configuration. Then puppet will tell you everything is configured correctly - which is partly true: the things puppet has been told to configure are configured, but what about the stuff I forgot to tell it about?

So I've been thinking about using the same approach as I use when I write code: Behavior Driven Development. I.e. you start by specifying the behavior of the program you are developing, after that you start you start to code. This has the benefit of easily letting you known when you are done. If your code pass all the behavior tests, then you can release it.

Translating this to Solaris installs isn't that hard, instead of describing program behavior you describe (operating) system behavior. You can use the same tools as you do for development, and I've been using cucumber for my Ruby on Rails projects, so it is what I picked for my initial testing. Cucumber uses natural language to describe the behavior you want, which makes it easy for non-programmers to understand what it is testing.

When you write the definitions, you should not use technical language, like: "ssh to the host weblogs and grep for an passwd(4) entry for the user martin in /etc/passwd" instead use something like "I should be able to ssh to weblogs, and log in as the user martin", which is the behavior you want. Cucumber then takes that definition and translates it into step-by-step instructions which can be validated.

This is how it can look when you run it:

martin@server$ cucumber
Feature: sendmail configure
  Systems should be able to send mail

  Scenario: should be able to send mail                  # features/weblogs.sfbay.sun.com/mail.feature:5
    When connecting to weblogs.sfbay.sun.com using ssh   # features/steps/ssh_steps.rb:12
    Then I want to send mail to "martin.englund@sun.com" # features/steps/mail_steps.rb:1

Feature: NIS client
  Systems on SWAN should be NIS clients

  Scenario: should be able to match entries in NIS    # features/weblogs.sfbay.sun.com/nis.feature:4
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then I want to lookup "xuan" in the passwd table   # features/steps/nis_steps.rb:1
    And I want to lookup "onnv" in the hosts table     # features/steps/nis_steps.rb:1

  Scenario: should be able to make lookups through NIS # features/weblogs.sfbay.sun.com/nis.feature:9
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then I want to lookup "xuan" through nsswitch.conf # features/steps/nis_steps.rb:5

Feature: SSH access
  SSH should be configured

  Scenario: ssh user access                            # features/weblogs.sfbay.sun.com/ssh.feature:4
    Given a user named "martin"                        # features/steps/ssh_steps.rb:3
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should succeed                 # features/steps/ssh_steps.rb:28

  Scenario: no lingering default OpenSolaris user      # features/weblogs.sfbay.sun.com/ssh.feature:9
    Given a user named "jack" with password "jack"     # features/steps/ssh_steps.rb:7
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should fail                    # features/steps/ssh_steps.rb:32

5 scenarios (5 passed)
13 steps (13 passed)

This makes it really easy to see if the behavior of the system is what you expect. All green means it is ready!

The stuff I am working on at the moment is to make the failures understandable by a non-programmer. For example when a scenario fails (and it succeeds to log in to a system where it should have failed), it looks like this:

  Scenario: no lingering default OpenSolaris user      # features/weblogs.sfbay.sun.com/ssh.feature:9
    Given a user named "jack" with password "jack"     # features/steps/ssh_steps.rb:7
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should fail                    # features/steps/ssh_steps.rb:28
      expected not nil, got nil (Spec::Expectations::ExpectationNotMetError)
      ./features/steps/ssh_steps.rb:29:in `/^the connection should succeed$/'
      features/weblogs.sfbay.sun.com/ssh.feature:12:in `Then the connection should succeed'

Failing Scenarios:
cucumber features/weblogs.sfbay.sun.com/ssh.feature:9 # Scenario: no lingering default OpenSolaris user

5 scenarios (1 failed, 4 passed)
13 steps (1 failed, 12 passed)

It is not obvious that expected not nil, got nil means that it could log in when it shouldn't be able to, so I am working on some custom rspec matchers to generate better error messages.

Once I've gotten a bit beyond playing around with this, I will publish the source if someone is interested in it.

   
 
   
20090623 Tuesday June 23, 2009
Planning to fail when using Puppet
Permalink | Comments [1] | 2009-06-23 14:46

We put a lot of thought into planning for failure when we setup our sites (like www.sun.com, blogs.sun.com and so on). Every component is redundant, from border firewalls to load-balancers to front end web servers to root disks. We even put the gear in separate racks on separate power, just in case someone accidentally knocks both power cables out. This is arranged in odd and even sides, and servers are placed in the corresponding side, i.e. blogs1.sun.com is placed on the odd side and blogs2.sun.com is placed on the even side. If we use more than two servers they are added to the respective side.

But the chain is only as strong as its weakest link: if I screw up when I update the puppet profile for our base server class, things will quickly go south.

No matter how carefully I test things before I commit my changes to the master mercurial repository and on to the puppetmaster (we only ran one per site before), there still is a chance things go boink! There are always some servers which were setup a few years ago, long before we started using puppet, that aren't installed and configured the way I expect, and when they are modified by puppet - they break!

So it doesn't matter that we are running multiple systems, they all get changed by puppet within 30 minutes.

To work around this problem I've set up two puppetmasters, and they serve the corresponding side (odd or even). This lets me push changes to the one side first, let it stew for a while, before I push it to the other side.

   
 
   
20090303 Tuesday March 03, 2009
Running puppet on OpenSolaris
Permalink | | 2009-03-03 10:32

I'm running puppet on the production servers I manage at Sun, and for Solaris 10 I've had to compile Ruby and create my own package (for easy distribution). I've also created my own puppet and facter packages, as I didn't want to setup rubygems.

Now on OpenSolaris this is much easier, as you can just run:

# pkg install -q SUNWruby18
# gem install -y puppet
Bulk updating Gem source index for: http://gems.rubyforge.org
Successfully installed puppet-0.24.7
Successfully installed facter-1.5.4
Installing ri documentation for puppet-0.24.7...
Installing RDoc documentation for puppet-0.24.7...
and you are all set to configure /etc/puppet/puppet.conf to get puppetmasterd and puppetd running!

   
 
   
20081211 Thursday December 11, 2008
Sendmail, may I introduce Alteon to you?
Permalink | | 2008-12-11 05:10

Yesterday we started using an Alteon VIP to load balance SMTP traffic to our two mail servers, and everything was fine and dandy, but when I took a look in /var/log/syslog I found loads of entries like this:

Dec 11 18:17:14 prod-git1 sendmail[20899]: [ID 801593 mail.info] j93FHDNX020899: alteon1.sun.com [192.168.10.1]
did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

The Alteon health check connects and then just issue a QUIT which sendmail finds suspicious, and hence feels obliged to let me know about it. This becomes very annoying when you have two Alteons doing the check every other second!

After scratching my head for a while and searching for a solution, I came across this patch to sendmail, which lets you select systems which shouldn't generate the above log entry. The only caveat was that I'd have to build my own sendmail, and I really don't want to roll my own stuff as it require more job to support, so I continued to look for a another solution.

I finally figured out (after reading the sendmail sourcode) that if I in /etc/mail/sendmail.cf set

O PrivacyOption=authwarnings,needexpnhelo,needvrfyhelo

sendmail would be quiet if the Alteon changed the health check to doing the equivalent of this:

mconnect localhost
connecting to host localhost (127.0.0.1), port 25
connection open
220 prod-git1.sun.com ESMTP Sendmail 8.13.8+Sun/8.13.8; Thu, 11 Dec 2008 13:58:48 +0100 (CET)
VRFY root
503 5.0.0 I demand that you introduce yourself first
QUIT
221 2.0.0 prod-git1.sun.com closing connection

So we changed the health check from being smtp to a custom script (note that you need the double backslashes):

open 25,tcp
expect "ESMTP"
send "VRFY root\\n"
expect "503"
send "QUIT\\n"
expect "221"
close

And after pushing this change out, sendmail stopped filling the log with messages I don't want to see.

   
 
   
20080508 Thursday May 08, 2008
Creating a user_attr puppet type
Permalink | | 2008-05-08 08:49

I've come a fair bit in my puppet testing now, but one thing I lack is a user_attr type. I.e. a way to update the /etc/user_attr file using puppet.

This is what I have in mind for the syntax:

user_attr { "martin":
    type => normal,
    roles => [
        "root",
        "admin"
    ],
    profiles => "Zone Management",
    auths => [
        "solaris.mail.mailq",
        "solaris.system.shutdown"
    ]
}

One thing I haven't figured out yet is how if the definitions should be absolute, i.e. if the entry must be exactly like the definition, or if it is enough that the listed values are present. In the above example, should the role list be exactly root,admin or should it just make sure that those two roles are in the list and you can have the role audit too. Perhaps it would be good to be able to use the absent/present syntax on individual items?

I haven't decided if I'm going to manage the other user attributes too, e.g. project, defaultpriv, limitpriv and lock_after_retries. I will probably leave that for a later release...

[Technorati Tags: ]

   
 
   
20080418 Friday April 18, 2008
Testing puppet configurations
Permalink | | 2008-04-18 23:57

I've set up a puppet environment which uses mercurial to store the configuration and manifests. Now I'm trying to build an environment to be able to test changes before I commit them to the repository, and they propagate to all our 400 servers - but I encountered a problem.

You can use a separate configuration directory with the --confdir option for both puppetd and puppetmasterd, and run everything on localhost, but the problem is the source parameter

file { "/etc/profile":
    owner => root,
    group => root,
    mode => 644,
    source => "puppet://server/base/profile"
}

The above source parameter contains the hostname, so when I want to test it on my local mercurial repository, it still connects to the server instead of localhost when it fetches the files.

Luckily there is a solution! If you leave out the server part, puppetd will insert the name of the server it is connecting to.

   
 
   
20080408 Tuesday April 08, 2008
Trying out puppet
Permalink | Comments [2] | 2008-04-08 23:04

I'm looking for ways to better manage our servers, and right now I'm playing with puppet.

I immediately ran in to a problem: it picked the wrong domain name. Internally at Sun we use NIS (yes, I know it is insecure and sucks in almost all aspects, but I'm not in position to change it - and believe me I have tried) and our NIS domain name doesn't match the DNS domain name.

This is something puppet (facter to be exact) doesn't figure out, at least not on Solaris. Instead of picking the correct fqdn for a host, e.g. puppetd.sfbay.sun.com, it picks puppetd.mpklab.sfbay.sun.com, since that is what the domainname command returns.

They tried to fix this, but unfortunately it doesn't work for Solaris, as it relies on the dnsdomainname which we don't have.

I've worked around it by creating my own /usr/bin/dnsdomainname which gets called before domainname.

#!/bin/sh
DOMAIN="`/usr/bin/domainname 2> /dev/null`"
if [ ! -z "$DOMAIN" ]; then
    echo $DOMAIN | sed 's/^[^.]*.//'
fi

So now I can continue to test my puppet configurations...

   
 
   
20080401 Tuesday April 01, 2008
The danger of growing too fast
Permalink | Comments [3] | 2008-04-01 05:04

Out esteemed director has pushed us too far too long - he requires us to rack 'em and stack 'em all day long, and after the last spree of installing alpha hardware he got from engineering (the new 4 way, 16-core Rock based systems, code name lurad) for the www.sun.com cluster we now have such a big mess in our server room that I thought I'd share it with you:

Picture by: VespaGT

We have added 72 of these little monsters since the beginning of last week and haven't had time to clean up the cables - so now it is time to bring out the dymo and start labeling...

[Technorati Tags: ]

   
 
   
20080307 Friday March 07, 2008
Converting HFS from case sensitive to case insensitive
Permalink | Comments [1] | 2008-03-07 12:04

I've managed to solve the problem I was blogging about earlier.

I started out by forcing TimeMachine to do a backup and since I wasn't sure I'd succeed in restoring my data using it, I did a gtar backup of all user directories too.

Once the backups were done I booted the Leopard install DVD, started DiskUtility, and reformatted the disk as HFS, Journalling and Case Insensitive. After that I started TimeMachine and choose the restore option. It immediately reformatted my disk to match the backup, and that wasn't what I wanted.

So I reformatted the disk again and then choose to do an install from scratch. When the installation completed and the system rebooted, the migration assistant asked if I would like to mograte old data, and I picked the option to restore from the last TimeMachine backup.

This time is didn't do anything with my file system and all files & settings were restored - and I could start the Photoshop CS3 installation and get it installed!

I don't know how it would have handled a conflict, i.e. restoring foo and Foo, since I wrote a Perl script to make sure that I didn't have any conflicts.

   
 
   
20080303 Monday March 03, 2008
Insensitive file systems
Permalink | Comments [2] | 2008-03-03 16:01

cASe inSEnsITIvE file system - what an utterly stupid idea!

When I installed Leopard on my MacBook Pro it was a natural choice to make the file system case sensitive. Besides being a UNIX geek I had a legitimate reason for doing so:
you can't do

hg clone ssh://martin@hg.opensolaris.org/hg/audit/patches

as the OpenSolaris source code contains case insensitivity conflicts.

So what am I bitching about then? Yesterday I tried to install Adobe Photoshop CS3 on my wife's MacBook pro (which I also installed with case sensitivity) and got this very unintuitive dialog:


This software cannot be installed because the file system of the OS volume is not supported

After scratching my head for a while, I figured out that it is due to the case sensitivity! Adobe hasn't bothered to fix their code, and it is not like it is a new feature in Mac OS X either... they have had several years to fix it.

Unfortunately there is no solution to this, but to reformat the file system and make it case insensitive! To go from bad to worse I can't use TimeMachine to do it, as it too doesn't support backing up a case sensitive file system and restoring it to a case insensitive. It just has to alert me if there is a conflict - which there isn't in my case, I've checked!

Luckily Mac OS X comes with all the UNIX tools we love and cherish, so I'll just use cpio or gtar to back up all my data and then nuke the / partition (while keeping my zpool)

Update: as suggested by zdz and Dick Davies I tried creating a disk image with a case insensitive HFS, but that didn't work either for the Photoshop installer. The hint is in the error message "OS volume is not supported". Back to the original plan of backup/reinstall/restore...

   
 
   
20071127 Tuesday November 27, 2007
Trying out mirrored zfs root on Indiana
Permalink | | 2007-11-27 10:05

I've been playing around with project Indiana, and the new installer and packaging system, and they are really nice.

When you install it turns the root disk into a zpool called zpl_slim, but it doesn't let you select two disks and mirror the zpool. Luckily you can fix this once the installation is done. When the system has booted, you can use the zpool attach command:

# zpool attach zpl_slim c7d0s0 c8d0s0
# zpool status
  pool: zpl_slim
 state: ONLINE
 scrub: resilver in progress, 11.75% done, 0h3m to go
config:

        NAME        STATE     READ WRITE CKSUM
        zpl_slim    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c7d0s0  ONLINE       0     0     0
            c8d0s0  ONLINE       0     0     0

errors: No known data errors
   
 
   
20071109 Friday November 09, 2007
CSWmercurial 0.9.5
Permalink | | 2007-11-09 12:30

Now that CSWpython is upgraded I've finally got my act together and found some spare cycles lying around in a drawer, so I could finish the update of the CSWmercurial package. I've sent it out for alfa-testing, so hopefully I'll be able to publish it by the end of next week.

   
 
   
20071103 Saturday November 03, 2007
13949712720901ForOSX
Permalink | | 2007-11-03 11:08

This post is a petition to Apple to get their act together and finish Java 6 for Leopard

If you wonder what the strange title means, read this blog post.

   
 
   
20071029 Monday October 29, 2007
Time Machine & ZFS
Permalink | Comments [3] | 2007-10-29 14:32

I've just installed Leopard on my MacBook Pro, and was first disapointed that it only had read only zfs, but after checking out ADC that was solved :)

I also wanted to try out Time Machine and thought that I could place the backups on zfs, but Time Machine doesn't let me select zfs as a destination. Hopefully I'll be able to trick it somehow ;)

Update:
after Jeff Harrell's comment I read up on Time Machine here and here, and as Jess says it uses directory hard-links, so that won't work with zfs. Bummer! :(

   
 
   
20070814 Tuesday August 14, 2007
The sound of a bone file
Permalink | Comments [1] | 2007-08-14 12:38

It has been hard to work today. Yesterday I had oral surgery because my jaw bone grew out of the gum! It started out as a sore spot in my mouth, which grew to a bulge, and a week ago the gum ruptured and the jaw bone shone like a bright white spot...

It turned out that when I pulled a molar many years ago the Swedish dentist I went to didn't do a very good job, so yesterday they had to "fix" it. The fix was to cut open my gum and using a bone file to get rid of the outgrowth. When he was filing away on my jaw the sound resonated in my skull - not a very nice sound!

The Brazilian dentist who performed the surgery was excellent! He is a 4th generation dentist working with his father, which is very uncommon in Sweden. They are both exceptionally good and if you ever need to fix your teeth while in Rio drop me a line and I'll give you their name.

Eight stitches later we were sent home with a list of medications and procedures to follow, my wife had to pull a tooth - she have (had) 34 teeth. The one that got pulled was some strange pre-historical tooth appearing behind the molars. The only good thing about all this was that we were told to eat ice cream, lots of ice cream!

What amazed me the most with the dentist was that he gave me the number to his home and to his mobile and told me to call if there was any problem or if I had any questions! That never happens in Sweden. He even called in the evening to check up on us. Talk about good service!

So today I've had a throbbing pain in my jaw which no amount of ice cream could get rid of. It has been hard concentrating on the work at hand: writing a set of script to run bart on all our Solaris 10 systems and generating alerts when there is a discrepancy. Not something which is very hard, but today I can't focus very well...

   
 
   
XML
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today


Old entries


Bloggtoppen.se
OpenSolaris: Love at First Boot