affect different devices. The worst case recovery time assumes that
the failure is the switch (or a link connecting to the switch) that is the
redundancy master, i.e. the device in charge of the redundancy opera-
tion. Recovery in the worst case scenario is assumed to be defined as
once all devices on the network are able to communicate to all other
devices after the failure has taken place.
As discussed, certain redundancy protocols may provide a quick
recovery but limit you on the topology (ring only) or maximum
number of devices it can handle (note that in this context, devices
refers to networking hardware rather than end devices). Others may
be able to handle a large amount of devices but could be proprietary
and thus vendor lock you to a certain manufacturer. Selecting the
correct redundancy protocol requires taking all these details into ac-
count while seeing what options are available to you. In other words,
every network should be application specific, and should be designed
independently with that application and its requirements in mind.
A final point that needs to be considered when regarding redun-
dancy is how the redundancy will be monitored. Having a complex
interconnected mesh network that can handle six link failures before
interrupting communications can prove to be useless if the network
and current link statuses are not being properly monitored. Without
monitoring you will arrive at a point where five of the links could have
failed, leading to the network no longer being redundant. However,
because no monitoring is taking place, this will not be discovered
until the sixth link fails, at which point communications will be inter-
rupted. Not monitoring the network redundancy is simply delaying
the inevitable failure.
Monitoring of redundancy can be performed in different ways.
Inexpensive, yet most time consuming, is to manually monitor all
devices periodically to determine whether any links have failed. This
may be feasible on a smaller network; however it still requires time,
diligence and provides the most opportunity for human error.
Other, slightly more set-up intensive methods include a central
silo server which periodically collects logs from the devices, or using
a failure relay that most industrial switches provide. This is a simple
relay that can be triggered in the event of a failure on the switch, which
could then be connected to an alarm light or noise producing device.
Both options require extra time and effort to set up, and still require
manual checking (of logs or alarm devices), but they do automate
and simplify the process somewhat.
The best (and most costly) method is a proper network manage-
ment system. These are software systems that use a variety of differ-
ent protocols to monitor the network on a 24/7 basis. Basic protocols
such as ‘ping’ are used for testing device connectivity, however what
really gives these set-ups their power and control is SNMP (Simple
Network Management Protocol). SNMP works by either having the
monitoring server poll the devices periodically for status updates, or
alternatively, by setting the end devices themselves to alert the server
when a problem is found. Most commonly a combination of these
methods is used. The NMS (Network Management System) can then
be used to examine all these statuses, often with visual topologies or
‘dashboard’ views to provide a quick overview of the network, usu-
ally using color coding to identify errors and faults. Some even allow
emails/sms’ to be sent to users in the event of any failure, allowing
a quick, out of office notification.
When set up correctly, a NMS is an invaluable tool that will moni-
tor not only redundancy and link status, but most other details about
a device, including things such as temperature, storage space of
devices with Adds, logins (both failed and successful) and more. For
larger and more complex networks, these systems can be beneficial
and should not be overlooked.
Two recently released redundancy protocols provide what is
known as ‘bump less redundancy’, namely HSR (High-availability
Seamless Redundancy) and PRP (Parallel Redundancy Protocol).
Bump less redundancy means that in the event of a link failure on
the network, these protocols do not require any amount of time to
recover. They achieve this in similar fashion, in that the switches
transmit two copies of any data they receive, and the receiving end
will simply discard any duplicate frames received. These protocols
are highly dependent on specific hardware, however devices known
as ‘red-boxes’ (redundancy boxes) allow non-HSR/PRP compliant
devices to be connected to the network correctly, and some end
device hardware manufacturers are beginning to investigate these
protocols for direct inclusion in their hardware.
Figure 2: High-availability Seamless Redundancy (HSR).
HSR, as shown in Figure 2, works on a single ring topology. However
unlike a normal redundancy protocol that would ‘break’ the ring in a
point by introducing a redundant link, HSR keeps the entire ring ac-
tive at all times. Any data that needs to traverse the network is then
CONTROL SYSTEMS + AUTOMATION
Abbreviations
HSR
– High-availability Seamless Redundancy
IP
– Internet Protocol
NAT
– Network Address Translation
NMS – Network Management System
PRP
– Parallel Redundancy Protocol
RSTP – Rapid Spanning Tree Protocol
SNMP – Simple Network Management Protocol
TCP
– Transmission Control Protocol
USB
– Universal Serial Bus
5
July ‘14
Electricity+Control