A geek's geek-log My not-so-secret diary

Friday Jun 26, 2009

The Sun Code For Freedom contest results are finally out and my pet project, HA-Cron, has been declared the first prize winner. :)

My old HP Compaq 6515b laptop had served me well for 2 years and now, it's got zero battery life, no sound (it's always been like that) and a touchpad that goes crazy after 5 minutes of usage. Now, with the new Dell laptop I'll be getting, I can actually setup a proper cluster! Good thing Open HA Cluster 2009.06 allows you to setup a cluster with just one NIC (cheers to Crossbow). Can't wait to give this a spin!

Congratulations to all the other winners as well!

Saturday May 23, 2009

The Jaipur Linux Users Group (LUG-J) organised it's second event, FOSJAM 2009, on the 16th and 17th of this month at JECRC, Jaipur. Being one of the proud members myself, I knew I just had to be a part of it! The event had a participation of around 280 students from a multitude of colleges in Jaipur. The event also featured talks by familiar names from the Indian FOSS community which included Vivek Khurana (no_mind), Shakthi Kannan (mbuf), Atul Jha (koolhead17) and Varad Gupta. Although it was decided that I'd give only one talk (on Open HA Cluster of course!), I ended up giving two. On the 17th, the event got a little delayed and hence, I decided to take up the task of keeping all the participants busy. So after gathering them all into a hall, I had a long and informal session on FOSS, why it is the best thing in the world for students, how to go about contributing and the whole idea of community driven contributions. The students were quite enthusiastic and I also ended up answering technical questions from all kinds of corners like MINIX versus Linux, using Makefiles, python, the quicksort algorithm and a lot lot more! The event then continued as per plan and I really enjoyed meeting more and more students. I was also surprised at the number of students who recognized me from the previous workshops I'd conducted including Helios and Fotia. :)

The Open HA Cluster talk started at 3:30 PM, and went on perfectly. I also got a good set of doubts from the students including what would happen if the cluster was partitioned such that there N/2 nodes in each partition (this was asked while I was explaining the split brain condition). There was another reasonably good doubt that I just can't manage to recall right now. :|

The event was quite an experience all in all, you can check out the pics over here. By the way, my college (Malaviya National Institute of Technology, Jaipur) is now the official host for all LUG-J meetings, the next of which will happen on the 26th of this month itself. Looking forward to meeting ya'll again!

Wednesday Apr 29, 2009

I can't believe it took me so long to blog about this project of mine, considering the fact that I'm almost done with it's development and I also gave a talk on it at Sun Tech Days 2009. This was originally proposed by the Solaris Cluster team for a workout at FOSS.IN 2008 but wasn't selected, so I thought I'd take it up as my Sun Code For Freedom Contest project. This is one out of two proposals of mine for the contest, the other being HA-Zabbix which I haven't started working on. :P

Now that I've bored you with the history, I'll move on to telling you all about what HA-Cron is and it's relevance.

Those of you familiar with what high availability clustering is would have easily guessed by now as to what HA-Cron does. Anyways, one problem with an HA cluster is that when a failover happens, the failed node's cron jobs remain there itself and do not carry over to the new node. This naturally implies that the system administrator will have to manually intervene every time a failover occurs which goes against the whole idea of high availability clustering itself, where the key is to keep the recovery from a failure smooth and automated. So HA-Cron is an agent for Open HA Cluster which keeps Cron highly available.

Developed over the GDS template, HA-Cron accomplishes it's task by a set of simple procedures which are as follows:

1) Upon turning an RG (resource group) online on a node, a backup is made of the original root crontab. Next, the cron jobs for that particular RG which are specified by the user in a file are added to the root crontab entry, and a test job is added to ensure that Cron itself is working properly.
2) Upon stopping an RG on a node, the cron jobs that belong to that RG are removed from the root crontab.

You can check out the project's homepage here. Please feel free to pool in your suggestions. :)

Cheers!

Monday Jan 19, 2009

At last, I've managed to contribute code to open source! Although they're just two very trivial and super easy oss-bite-sized bugs, something's better than nothing right? What say? One bug fix was in the GDS coding template where I corrected a minor mistake with the local_zone_zsh() function. I stumbled upon it while developing HA-Cron, one of my Code For Freedom projects. The other bug involved me correcting some improper cluster boot messages.

Both this fixes are coming out with the next release of OHAC. w00t! ^^


Wednesday Sep 24, 2008

Although it's the field I find most enticing, it's kind of strange I've never blogged about high availability clustering. Ever since my HOD, Professor M.S Gaur, asked me to deploy Linux HA (Heartbeat) to keep our servers highly available, I've developed a deep interest towards this concept. Disaster can strike at any time and when this leads to disruption of a service, an organisation can incur huge losses. To overcome this, we make use of high availability clustering where several nodes appear as a single entity to a client, offering the given service. The nodes together may either act as a failover cluster wherein only one node provides the service at a time (active node) and if it fails, another node rises to take it's place or it may be a scalable cluster where the load is distributed across multiple nodes to maintain availability. Here, I'll be giving a step by step detail on how to failover apache between zones.

The first step, of course, would be to install SXCE. I choose SXCE build 86. There isn't much to specify here apart from the fact that you need to create a 512MB partition to be mounted as /globaldevices. The cluster nodes will access devices connected to other nodes through here. Next, you have to set up a single node cluster. I won't be going into details of that because we have this excellent piece of documentation over here:

http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/SCXdocs/installsinglenode/

Now, we move on to the interesting part.

The plan here is to set up two zones named node1 and node2. We will be failing apache over the two of them.

#: zonecfg -z node1 create
zone3: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:node1> create
zonecfg:node1> set zonepath=/export/home/lalith/zones/1
zonecfg:node1> set autoboot=false
zonecfg:node1> add net
zonecfg:node1: net> set address= <IP 1>
zonecfg:node1: net> set physical= <device name>
zonecfg:node1> end
zonecfg:node1> verify
zonecfg:node1> exit

#: zoneadm -z node1 install

This will install zone node1. Now to configure node2.

#: zonecfg -z node2 create
zone3: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:node2> create
zonecfg:node2> set zonepath=/export/home/lalith/zones/2
zonecfg:node2> set autoboot=false
zonecfg:node2> add net
zonecfg:node2: net> set address= <IP 2>
zonecfg:node2: net> set physical= <device name>
zonecfg:node2> end
zonecfg:node2> verify
zonecfg:node2> exit

#: zoneadm -z node2 install

Now boot both the zones.

#: zoneadm -z node1 boot

(configure node1)

#: zoneadm -z node2 boot

(configure node2)

Apache by default won't have the httpd.conf file ready for you. So what we'll be doing here is to create a copy of the httpd.conf-example file with the name httpd.conf in the global zone and the non global zones.

In the global zone, zone node1 and zone node2,

#: cp /etc/apache/httpd.conf-example /etc/apache/httpd.conf

Now your cluster should have a logical hostname that will identify the currently active device. This will remain unique to the cluster. It is important that all devices in the cluster have this hostname listed in it's /etc/inet/hosts. I choose the name 'node' for my cluster (I just realised it doesn't make much sense :P )

In non-global zone, zone node1 and zone node2,

#: echo '<Cluster IP> node' >> /etc/inet/hosts

Now your apache is all ready to go. The next step would be to setup the resource groups for your single node cluster. First, you'll need to setup the logical hostname resource that will set the hostname (that you'd added into your /etc/inet/hosts file) to identify your cluster. It is essential that you do this otherwise your IP won't failover (not like the cluster will allow you to make the resource group without this :P). Second, we'll configure apache to run on our single node cluster. We'll be using the Java Web Console for this (if you haven't enabled it, look into the 'how to setup a single node cluster' documentation again).

First, login to the web console (http://127.0.0.1:6789 by default), and go to Sun Cluster.

The first step involved in making the cluster would be to setup a resource group. In the left menu, go to resource groups and then, select 'New'. You will get another window where you'll be asked details about your resource group. I've already made the resource group named web1, so my window would look different from yours. This is just to give you an idea of the details.


Now select the primary nodes, we'll be including our zones here.

Once you've setup your resource group, it's time to setup individual resources. Again, select resource groups and from the drop down, choose the resource group that you just made. Select 'New', and setup two resources, one for your logical hostname...


... and the other for your Apache webserver. Set logical hostname as a strong dependency and specify the binary path for your apache (/usr/apache/bin on mine).

I'll remind you again that I'm showing you the details AFTER I've set up my resources, so hence, the drop downs are missing and the menus will be different.

Our resource configuration is done. Now go back to the resources group menu, select the group we just configured and click on 'restart'. Check if the services are online. If Murphy is kind enough, your cluster should be good to go!

Next, we'll invoke a failover between our zones. First, let's get an idea of our cluster's status with the following command,

#: /usr/cluster/bin/cluster status

Here's the output on my system...


Notice that our first zone, mercury:node1 is now online. You can verify this by opening your web browser and entering in your address bar both the cluster IP and node1's IP. Both will show you apache's default test page which implies that node1 is now in charge of the cluster. Now, to invoke the failover, we shut down node1 with this command,

#: zoneadm -z node1 halt

#: /usr/cluster/bin/cluster status

And the output is.... 

There you go! You've just failed apache over to zone mercury:node2. You may verify the same using your web browser. Keep repeating the process as many times as you want. Don't forget, have fun!

 Cheers! :)