”There can be only one” – about all the places where you can’t have two of everything even when you have two of everything.
This is a part of a series of articles about redundancy. These articles should be read in the correct order. If you have not read the previous article, then use this link to go to the first article Redundancy to solve all problems.
In most redundant systems it is necessary to have a switch to switch between Main and Backup. Apart from some load-shared redundant systems, we will always need something or someone to detect that we have an error on Main and switch to Backup. In many situations it will be some automatic sensing of the signal or the response from Main and if the sensing device is not satisfied with this, then it will switch to Backup. The dilemma is that no matter what we do, it must end up in one decision only. This turns the decision-make into a single point of failure, which can overlook a problem and not switch when it should have done it or mistakenly interpret a valid response as invalid and switch when it shouldn’t have. Finally the control unit could have failures of its own, where it could switch the output but not tell Backup to start.
Many detection problems, where a switch to Backup should have been executed but wasn’t, are because we make a system that is good at detecting if Main doesn’t work at all. But what if Main only works partly? Then we depend on that Main doesn’t work in the right way before we can detect it and switch to Backup. If Main doesn’t work in the wrong way, then we detect nothing and Main will just carry on not doing what it is supposed to do. Since we are trying to build a system that must take of all possible errors, it becomes very hard to build the right control unit. It should do the right thing in all situations, both the situations we can imagine and those we can’t. This can get very compex and the more complex it gets, the greater the risk for making wrong decisions, switching when we are not supposed to or switching back and forth, so that nobody can figure out what is going on. As minimum it should be possible to manually overrule the automatic switching so that we can force the use of Main og Backup. And then there is the human factor – we will get back to that.
As mentioned earlier, there will always be at least one single point of failure in a redundant system (except for load-shared system where there doesn’t have to be one). So its not a good idea to save money on the control logic. As a minimum, you should make sure that broadcast can continue if the power to the control logic or switch should fail.
Many radio stations have a silence detector sitting somewhere between the studio and the transmitter. Silence detectors can help to avoid dead air in many situations, because they can switch to (and start) an emergency playout device in case there is no signal from the studio. But you should make sure that your silence detector has a connection from the Main input to the output even when the power to it is turned off or its power supply fails.