Probability Reliability Analysis in Action

By:  Doug Bors, VP
Sparling

Seattle saw it first! The Seattle Chapter's presentation on probably reliability analysis (PRA) went on to play at the national convention in San Francisco.

Probability reliability analysis (PRA) is abstract. Several practical truths are ignored in order to simplify the mathematical model so results are easy to calculate. That exact objection was voiced during the Q & A session in San Francisco. A member of the audience claimed that none of the system failures he had seen matched the predictions of PRA. In fact they were all related to other more practical mistakes. Further, many IT organizations attribute only 20% of system failures to hardware, the other 80% to operator error. So why bother with theoretical system analysis?

The value of PRA is not in generating theoretical failure rates—it is in discovering large differences in failure rates between design options. For example, a fully segregated string system provides a predicted reliability about 100 times greater than the predicted reliability of a fully redundant bus system.

We cannot ignore this difference. Instead we must ask ourselves what are the immediate, practical implications of this result. Do we bet against 100 to 1 odds? Or, do we go with the string system and then do our best to solve the practical problems in its implementation and operation?

Several less dramatic claims are based on PRA calculations:

1. The best place to invest in dual components in today's data center is near the IT device—starting with dual power cords at each device.

2. An extra standby generator is not always a good buy—i.e., the improvement in reliability is small in reliable utility areas.

3. A second utility feeder is seldom a good buy. A significant increase in reliability is only available near the boundary between utility service areas.

4. In Level 5 and Level 6 systems, control systems are limiting factors to improving reliability. For example, the EPO system must be carefully segregated to maintain the reliability of a dual string system.

5. The demand for predictable operating procedures increases when increased reliability is desired. Trained operators, prepared with effective standard operating procedures (SOP's), controlled access for untrained persons, regular testing of all equipment, and effective maintenance programs are all necessary in order to obtain the highest possible reliability.

Work with PRA has yielded a basis for making several valuable decisions about investing money in reliability. But, each new project has different goals and different budgets, so PRA continues to be valuable to help buy the maximum reliability possible with every new budget.

Doug's advice: Take calculated risks.

Doug Bors presented PRA methods based on the IEEE Gold Book at the 7 x 24 Exchange National Conference in San Francisco in May. He is VP of Technology Consulting & Research at Sparling, Inc.

Back to Technical Articles