Why Pod based grid designs?
Wednesday Nov 01, 2006
I've just read my colleague and good friend's new blog about Pod based grid designs vs centralised switch designs. it's unearthed a torrent of thoughts on the subject and I need to dump them on you:
Tony Marshall
[...]
Pods are great as a modular system that allows growth in specific unit sizes. They need to be spec'ed to be the right size from the start, if they are too large then there is a lot of waste of nodes not being used, if they are too small then the core switch will run out of ports.
There is a limit on when it makes sense to have Pods as opposed to the nodes all connecting into a large core switch. Also it depends on the segregation required between Pods/Nodes which again comes back to the requirements of the grid.
[...]
However with pods there are more network devices to manage which brings its own issues and proper management tools for grids are a must.
[...]
Pods are great as a modular system that allows growth in specific unit sizes. They need to be spec'ed to be the right size from the start, if they are too large then there is a lot of waste of nodes not being used, if they are too small then the core switch will run out of ports.
There is a limit on when it makes sense to have Pods as opposed to the nodes all connecting into a large core switch. Also it depends on the segregation required between Pods/Nodes which again comes back to the requirements of the grid.
[...]
However with pods there are more network devices to manage which brings its own issues and proper management tools for grids are a must.
Tony touches on several essential issues here to managing a large (or small grid).
Firstly you need really good management tools. This is kind of obvious to anyone who's built a grid before, but if you want your 1000 systems to be useful you ought to have a good handle on them. You'd probably want to be able to provision an operating system, upgrade firmware, monitor hardware etc on all your systems from a single centralised place - otherwise you'd have such a hard time managing the system (think staffing costs) you'd be better off spending your money on large SMP servers and having far fewer OS instances to manage.
The same principals are true for switches and grid network designs in general. Swap SMP servers for large chassis based switches (e.g. think the larger 1500 port Force 10 switches as an example) and swap 1U general purpose servers for 1U general purpose switches (like the Summit 400). You end up with the same cost trade-offs - if you don't invest in decent management tools, you might as well invest in a big chassis based switch.
Just as large SMP servers are expensive and going out of fashion, so, in my opinion, are large centralised switches.
With SMP servers you get the advantage of a pre-cabled mid plane so you don't need to connect up all the processors yourself and you'd hope the manufacturer made it easy to share the resources between multiple users and applications (Solaris is pretty darn good at this).
However with large chassis switches you still have to cable up all the nodes yourself, and these cable runs are long! Growing your data center involves cabling guys standing on site, pulling hundreds of cables past all your production systems and most critically fiddling with your core switch. Or is it switches?
As more and more types of applications become grid enabled and more and more computer users realise the savings of a flexible, scalable, low cost infrastructure (think VMWare), grids move from dedicated clusters for a specific workload to THE DATACENTER. I don't know about you, but I wouldn't build a data center that relied on a single core switch, no matter how resilient the manufacturer says it is (how many computer products are resilient to fire and floods?).
If you connect all your servers to central switches you create one big SPOF (Single Point Of Failure). You could connect each system to two identical separate switches, you could put these in separate rooms, you could do an awful lot, but what you can't do is avoid the need for double the amount of cabling, and if you got a 1000 systems that's a big load of cable ... unless ...
Yes, you've guessed it ... unless you use pods. If you use a 1U general purpose switch in each rack you can connect each server into that local switch. You can have your rack full of general purpose servers connected to your general purpose switch with no cabling outside the rack. You can order this pre-built and have it delivered as a complete unit. That should be your building block.
You can arrange these racks into Pods of, say 4 racks (so at 32 1U systems per rack that's 128 systems) and you can base your network design around this. You can use 10Gb Ethernet to link racks and connect them up to your core switches. Your core switches can be separated as I described earlier, but the big difference is the number of cables you need to run, perhaps two or four fibres per Pod depending on your requirements. Think how much less screwing about that is in your production datacenter.
To return to Tony's remark about proper tools for management. Yes Pods give you more switches to manage, but grids are all about using lots of cheap fast devices as one, why should the network design be any different?
Management of switches is particularly important for a requirement for utility computing grid where it may be desirable to segregate one customer from another. To return to my SMP analogy it might be argued that its easier to use the expensive chassis switch because its easier to partition groups of systems off (lets be blunt here, we are talking VLANs and ACLs (Access Control Lists) here, not rocket science), but I'd disagree.
Any automated or at least human operator friendly tool to dynamically partition any network is going to have to understand the concept of their being more than one switch in a network. Heck even a tool to manage a single chassis based switch is going to need to understand the concept of different switch blades. So if you need flexibility in your network, don't base this decision on what you can do through the CLI of a single central device. Instead go talk to your network vendor, there is plenty out there to choose from Force 10, Extreme, Cisco, Nortel, Foundry etc.
Say, to your vendor "I need some tools to manage a large number of switches", and see what they say. Look at what they offer look at the costs, maybe even consider developing or extending some existing internal or open source tools, but don't seriously constrain your whole datacenter unless you really know what you are doing!
Oh and finally, if you find all this rather daunting and just want to buy the solution from people who have to understand this for a living, do drop Sun a line, we're happy to help







Posted by Jeremy on February 01, 2007 at 11:08 PM GMT #