| |
Alok Aggarwal's Weblog
Tuesday May 24, 2005 |
|
RPC Versioning for rpc.metad
One of my earliest contributions to Solaris 10 was to version the
rpc.metad daemon. In this post I'll talk about the problem I was trying
to solve and why I was trying to solve it as a precursor to how I
actually solved it.
What's this rpc.metad?
rpc.metad is one of the SVM rpc daemons and it's entire purpose in life
is to facilitate multiple hosts in sharing common storage which is
going to be used for shared SVM disksets*. The daemon communicates
between the hosts in question while configuring the volumes and making
changes to the configuration. As an example, while adding disks to a
shared diskset with two hosts A and B, it
probes each of the hosts with the question, "Hey, I see a disk c1t0d0
with these characteristics. Do you see the same disk on your
side? Okay, both of us are seeing the same disk; now I'm going to add this
disk to a diskset called oracle, you okay with that? Yeah, go ahead use
it I'm not using it"
* Shared SVM disksets, here, refer to the concept where one and only
one host can access the diskset at any given point in time. This
configuration is mostly used in high availability (HA) kind of
environments or in conjunction with the clustering software.
Why did it need to be versioned?
Early on in the Solaris 10 development cycle, support was added to SVM
so you could create multi-terabyte volumes as well as leverage
multi-terabyte LUNs. As part of this effort, changes were made to a lot
of the structures used internally by SVM (not the on-disk structures).
When these changes were made, it was made sure that nothing broke if
you wanted to upgrade/downgrade the machines.
However, during the external code review process (I think that's what
it was; anyhow, you get the idea), one of the reviewers pointed out
that there was a case where backward compatibility was broken. I'll
expound on this soon, but let me take this opportunity to explain a
little bit about the code review process here in the Solaris
organization.
Solaris code review process - A detour
Every change to the Solaris source, i.e. every bug fix (no matter how
trivial it is) and every project needs to be reviewed by someone other
than the engineer making the change. Typically, for a bug fix one needs
to get it reviewed by one or two engineers. For a bigger change, i.e. a
project or an RFE, the change needs to be reviewed not only by
engineers inside the same technology group but by engineers in
different (but hopefully related) technology group as well (external
code review). The motivation for doing this is to get multiple sets of
eyes looking at the change so as to provide healthy criticism for the
change being made. This helps in making sure
that - a) The right fix is being made and the fix won't cause future
bugs b) Other areas of code are not being overlooked c) It's not going
to break other Solaris functionality. Code reviews are just one of the
process related tasks we in the Solaris organization undertake in order
to make sure the Solaris code is always high quality.
Back to the real topic ..
So, back to the original topic, one of the external code reviewers
mentioned that since rpc.metad uses the changed structures, it's likely
not going to be able to talk to other rpc.metad processes that have an
"older" view of the changed structures. This was particularly going to
be a problem when the clustering software is being used in a rolling
upgrade scenario where the cluster nodes are being upgraded on a
rolling basis, i.e. some of the nodes can be running an older version
of solaris (say Solaris 8) whereas some of the other nodes can be
running an upgraded version of solaris (say Solaris 10). Each of these
nodes need to be able to communicate with each other in such a
scenario. The problem this presents is - you've got a Solaris 8
rpc.metad that knows about the Solaris 8 version of the structures and
a Solaris 10 rpc.metad that knows about the Solaris 10 version of the
structures. The result - the two rpc.metad processes can't communicate
with each other!
The solution to this problem was to version the Solaris 10 rpc.metad,
i.e. make it understand the older structure definitions as well as the
newer structure definitions by leveraging the versioning capabilities
of the RPC framework. Simple as that.
In a future post I'll go over the details of how I implemented the
versioning changes. So long!
Technorati Tag: RPC SVM
( Jun 10 2005, 01:27:30 PM EDT / May 24 2005, 04:04:52 PM EDT )
Permalink
Trackback: http://blogs.sun.com/aalok/entry/rpc_versioning_for_rpc_metad
|
|
|
Trackback URL: http://blogs.sun.com/aalok/entry/rpc_versioning_for_rpc_metad
|
| « November 2009 | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | | | | | | | | | | | | | | Today |
Today's Page Hits: 88
|