Electricity + Control

affect different devices. The worst case recovery time assumes that

the failure is the switch (or a link connecting to the switch) that is the

redundancy master, i.e. the device in charge of the redundancy opera-

tion. Recovery in the worst case scenario is assumed to be defined as

once all devices on the network are able to communicate to all other

devices after the failure has taken place.

As discussed, certain redundancy protocols may provide a quick

recovery but limit you on the topology (ring only) or maximum

number of devices it can handle (note that in this context, devices

refers to networking hardware rather than end devices). Others may

be able to handle a large amount of devices but could be proprietary

and thus vendor lock you to a certain manufacturer. Selecting the

correct redundancy protocol requires taking all these details into ac-

count while seeing what options are available to you. In other words,

every network should be application specific, and should be designed

independently with that application and its requirements in mind.

A final point that needs to be considered when regarding redun-

dancy is how the redundancy will be monitored. Having a complex

interconnected mesh network that can handle six link failures before

interrupting communications can prove to be useless if the network

and current link statuses are not being properly monitored. Without

monitoring you will arrive at a point where five of the links could have

failed, leading to the network no longer being redundant. However,

because no monitoring is taking place, this will not be discovered

until the sixth link fails, at which point communications will be inter-

rupted. Not monitoring the network redundancy is simply delaying

the inevitable failure.

Monitoring of redundancy can be performed in different ways.

Inexpensive, yet most time consuming, is to manually monitor all

devices periodically to determine whether any links have failed. This

may be feasible on a smaller network; however it still requires time,

diligence and provides the most opportunity for human error.

Other, slightly more set-up intensive methods include a central

silo server which periodically collects logs from the devices, or using

a failure relay that most industrial switches provide. This is a simple

relay that can be triggered in the event of a failure on the switch, which

could then be connected to an alarm light or noise producing device.

Both options require extra time and effort to set up, and still require

manual checking (of logs or alarm devices), but they do automate

and simplify the process somewhat.

The best (and most costly) method is a proper network manage-

ment system. These are software systems that use a variety of differ-

ent protocols to monitor the network on a 24/7 basis. Basic protocols

such as ‘ping’ are used for testing device connectivity, however what

really gives these set-ups their power and control is SNMP (Simple

Network Management Protocol). SNMP works by either having the

monitoring server poll the devices periodically for status updates, or

alternatively, by setting the end devices themselves to alert the server

when a problem is found. Most commonly a combination of these

methods is used. The NMS (Network Management System) can then

be used to examine all these statuses, often with visual topologies or

‘dashboard’ views to provide a quick overview of the network, usu-

ally using color coding to identify errors and faults. Some even allow

emails/sms’ to be sent to users in the event of any failure, allowing

a quick, out of office notification.

When set up correctly, a NMS is an invaluable tool that will moni-

tor not only redundancy and link status, but most other details about

a device, including things such as temperature, storage space of

devices with Adds, logins (both failed and successful) and more. For

larger and more complex networks, these systems can be beneficial

and should not be overlooked.

Two recently released redundancy protocols provide what is

known as ‘bump less redundancy’, namely HSR (High-availability

Seamless Redundancy) and PRP (Parallel Redundancy Protocol).

Bump less redundancy means that in the event of a link failure on

the network, these protocols do not require any amount of time to

recover. They achieve this in similar fashion, in that the switches

transmit two copies of any data they receive, and the receiving end

will simply discard any duplicate frames received. These protocols

are highly dependent on specific hardware, however devices known

as ‘red-boxes’ (redundancy boxes) allow non-HSR/PRP compliant

devices to be connected to the network correctly, and some end

device hardware manufacturers are beginning to investigate these

protocols for direct inclusion in their hardware.

Figure 2: High-availability Seamless Redundancy (HSR).

HSR, as shown in Figure 2, works on a single ring topology. However

unlike a normal redundancy protocol that would ‘break’ the ring in a

point by introducing a redundant link, HSR keeps the entire ring ac-

tive at all times. Any data that needs to traverse the network is then

CONTROL SYSTEMS + AUTOMATION

Abbreviations

HSR

– High-availability Seamless Redundancy

IP

– Internet Protocol

NAT

– Network Address Translation

NMS – Network Management System

PRP

– Parallel Redundancy Protocol

RSTP – Rapid Spanning Tree Protocol

SNMP – Simple Network Management Protocol

TCP

– Transmission Control Protocol

USB

– Universal Serial Bus

5

July ‘14

Electricity+Control

Electricity + Control - page 7

Warning.