They are two terms which are often used to refer to the same thing, but Redundant and Fault Tolerant are actually very different - and one certainly doesn't imply the other.
Redundant
As you'd pretty much expect, redundant means that you've got more of something than you need. The tyres on your car are redundant - you only need 4 to drive, but you have 5, including a spare.
Reduntant doesn't imply that there is no impact to service when a component fails, it simply means that you are able to recover the service - to at least a working (although possibly degraded) state - without the need for any external components. If you get a flat tyre on your car you need to stop and replace it with the spare (redundant) tyre. This has an impact, but it still allows you to recover from the flat without needing any external assistance.
Fault Tolerant
As the name implies, Fault Tolerant refers to the ability to tolerate a fault. The exact definition of Fault Tolerant will vary depending on who you ask, but generally it implies the ability for a service to continue running despite a fault. eg, "run flat" tyres on a car could be an example of fault tolerant - despite the failure you are able to continue driving without an "outage".
Redundant components, Fault Tolerant systems
Fault tolerant systems are usually designed by using redundant components. Probably the most common form of fault tolerance we are used to is "RAID" - Redundant Array of Inexpensive Disks - but why does RAID have "redundant" in it's name if it's actually fault tolerant?
The distinction comes down to the difference between the individual componets and the entire system itself. Individually, the disks within a RAID array are redundant, but they are not fault tolerant - if a disk fails, then it is dead. However as a system, a RAID array is fault tolerant - if a disk fails the array, and your data, is able to continue without interuption (although probably with degregation).
The Component - the disk - is redundant.
The System - the array - is fault tolerant.
With a few historic exceptions, no Sun systems are completely "Fault Tolerant", although they frequently contain fault tolerant sub-systems, such as :
* Power Supplies
* Fans
* Disks (using RAID)
* RAM (using ECC - Error Checking and Correcting memory)
In most (all?) cases this fault tolerancy is achieved using redundancy - multiple power supplies, multiple fans, multiple disks in a configuration where the failure of any one can be transparently handled without an outage.
Some high-end Sun systems can go a step further and be configured to be completely Reundant. This still doesn't mean that they can transparently handle any failure, but much like a flat tyre on your car the system is able to re-configure itself to map out the failed component, and come back up in a (possibly) degraded configuration. Whilst there is obviously an impact in doing this, it's far better than the alternative of being non-redundant, and far far (far!) cheaper than the alternative of being fully fault tolerant in hardware.
Posted by nieves on September 04, 2006 at 09:03 AM EST #
Posted by 213.60.65.145 on September 04, 2006 at 09:15 AM EST #
Posted by nieves on September 04, 2006 at 08:26 PM EST #