Binu Jose Philip's Weblog
A rave for PxFS
Sacrilege. I realized only recently that I haven't blogged nice things about the technology that I work on. While I am at it, may I also point your attention to the side bar on the left^h^h^h^h right which is full of possibilities and not much in realization of those possibilities. That side bar had led me to CSS and hours of wonderful time in front of the monitor creating rectangles of various colors and sizes and overlaps. The psychedelics started with a visit to http://www.csszengarden.com I still remember the plans and grand designs I had for the next web creation of mine. Don't you fret, those thoughts and designs are still locked up somewhere in there. But what I actually put in place is what you see here, a div that doesn't break or justify lines and a side bar full of defaults. What fun to create vaporware, eh? Talking about vaporware, I still haven't talked a single thing about PxFS. Aha, the cat is out of the bag and the probability wave hasn't collapsed yet. So, about PxFS. PxFS is the general purpose distributed filesystem used internal to Solaris Cluster nodes. More cats out of the bag now. By this time next year you would have heard much more about Solaris Cluster in the Open. Haha .. in the open, all of the code for Solaris Cluster will be open. As of today http://opensolaris.org/os/community/ha-clusters/ohac will tell you what is open in Solaris Cluster and what is not. PxFS is not open yet, but I can talk about it. What does the big "Distributed HA Filesystem" suit-speak really tell? PxFS is a Highly Available, Distributed and *POSIX compliant* file system layer above some disk based file system. The disk based file system can be UFS or VxFS for now. Layering it above something better like ZFS is technically feasible. Okay. Now for details about what each of the above terms really mean. Before I go into the explanation, I am explaining the real basics of PxFS here, so total new-bees can also understand and I can pretend I know much much more than what I talked about here. Distributed. PxFS is a distributed file system internal to cluster nodes. To explain distributed, take the analogy of electric supply to a house. If you have only one socket in the house, then the supply at your house is not distributed. If you add more outlets then the supply becomes is distributed. Similarly, a Solaris Cluster can have 1 to err.. 8 or 16 nodes. No I am not going to quote an exact number, I like vague. PxFS allows the filesystem hosted in one of the nodes to be accessible in any of the other nodes. *Any* of the other nodes. It is like NFS in that it does not need a disk path, yeah so maybe it is just a file access protocol. To restate, if you globally mount a UFS or VxFS filesystem on a cluster node and the mount directory exists on all cluster nodes, you can access that file system on all cluster nodes at the mount point. Distributed. Now for the Highly Available part. Let's go back to the analogy of vibrating electrons in a linear conductor. If your house's electricity supply has an inverter to back it up then your electricity supply is highly available. If the main line goes down, the inverter (battery) kicks and you don't notice a down time. For exactness, there is the few milliseconds the inverter needs to cut-in when there is no power. Similarly, in a Solaris cluster setup, if you have more than one node with a path to storage hosting the underlying filesystem for PxFS, you have a highly available PxFS file system. If the node hosting PxFS goes down, the other node with path to storage will automatically takeover and your applications will not notice any down-time. Similar to the inverter takeover delay, there will be a brief period when your fs operations are delayed, but there will be no errors or retries. And that is the highly available part. What about POSIX compliant? Take writes to any POSIX compliant single node filesystem. There is a guarantee that every write is atomic. If there are multiple writes to the same file without synchronization between the writers, you have the guarantee that no writes will overlap. The only unknown is the order of writes. Similarly, in a PxFS filesystem, writers from the same node or multiple nodes can do writes with the guarantee that their writes will not get corrupted. That is one example of POSIX compliance, guarantees like space for async writes and fsync semantics, everything POSIX (as far as I know) is guaranteed on PxFS. And that is POSIX compliance. And the administration overhead? .. adding a "-g" to your mount command and making sure there are mount directories on each node. "man mount" will tell you about "-g". That part, the administrative simplicity is worth many paragraphs of prose. The value of simplicity has already been proven by "zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0" and all of ZFSs other possibilities which saves you a lot of wear and tear on fingertips and neurons if you had to use SVM and metaxxxx.
Posted at 07:08AM Oct 12, 2007 by binujp in cluster and PxFS |
Today's Page Hits: 62
| « November 2009 | ||||||
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | |||||
| Today | ||||||