Managing machine safety and productivity with QorIQ multicore processors
October 25, 2013
Current legislation is now driving system feature requirements, specifically in the area of health and safety. This article focuses on the control sub...
needed in industrial control.
In North America and Europe, UL or CE marks are familiar sights in most households. Less widely known is what these marks stand for, and the stringent processes that manufacturers must follow to qualify for those marks. A key motivation behind these stringent processes is safety, which applies to equipment and machinery ranging from handheld power tools, elevators, railway systems, and robots on the factory floor to nuclear, oil, and gas installations. Safety is paramount in the latter group, but similar care and attention is also given to equipment that people use on a daily basis.
The objective behind what the industry calls “functional safety” is to limit the risk of physical injury or health impact on people operating or using equipment and machinery. Not only must equipment operate correctly in response to inputs, but it must also safely manage any operator errors, hardware failures, or environmental changes. While functional safety covers the end-to-end system, the control subsystem focuses on sensors, the programmable logic control function, and the actuation subsystems. The following covers the functional safety impact on the control subsystem architecture, and how a QorIQ multicore processing solution can address safety requirements more efficiently by improving machine productivity.
Making machines safe
Standards covering functional safety are specified by governing bodies in each country. For countries in the European Union, the baseline is IEC 61508. In North America, the ISO 13849 specification is enforced. The process of achieving certification involves a number of steps that identify the required safety functions, potential hazards, and any risk-reduction required. This goes towards identifying the required Safety Integrity Level (SIL) for ICE 61508, or Performance Level (PL) in the case of ISO 13849. Other key factors in the process include the Hardware Fault Tolerance (HFT), or the number of faults equipment can tolerate, and the Safe Failure Fraction (SFF), which is the probability of the system failing in a safe state. Responsibility for these aspects lies with appropriately skilled engineers, who, like the standards, have to take a holistic system approach.
The HFT and SFF are significant in that they are a measure of the redundancy and diagnostic capabilities of the subsystem. The HFT depends on the amount of redundancy and voting policy used in the system. The SFF is a measure of the fail-safe design and quality of built-in diagnostics.
Figure 1 shows a dual-redundant system with one of two voting architectures and diagnostics (1 out of 2 with Diagnostics, or 1oo2D). 1oo2D means two channels will process the same inputs and request a certain action. The voter will compare the request from both channels, but only use the data from the channel with good diagnostics. This implementation also provides for redundant fallback to 1oo1D.
The diagnostics will report on software or random failures, incorrect operator inputs, or common cause failures arising from environmental impact (a memory or data bus corruption caused by EMC, vibration, temperature, or pressure changes). In the case of the QorIQ multicore processor, a channel can be implemented on a core utilizing local cache/memory. The basic hardware features that would be used here are Error-Correcting Code (ECC) and parity on the memories. Making systems safe requires additional processing performance to deliver real-time computation and diagnostics. Duplicating hardware resources provides the redundancy necessary to increase the probability of systems achieving a safe state in the event of a failure.
Redundancy and real-time diagnostics
Herein lies the challenge for equipment or machine manufacturers: adding redundancy has a direct impact on hardware costs, as it traditionally involves replicating controller modules or processor components. This leads to two questions:
- Can functional safety be provided without the replication of hardware?
- Can smarter and more integrated diagnostics help manage functional safety and productivity at the same time?
To illustrate the second question a little more, consider a robot on an assembly line. The robot is protected by light curtains (Figure 2).
There is a clear difference between 2a and 2b. The first illustrates an incident where an operator is at risk of injury and the robot should de-energize or stop. This is not the case in the second illustration, where perhaps slowing the robot down would be sufficient. A system that can differentiate will improve productivity since it can maintain safety but also keep the production line going. In this situation, an additional light curtain could be used to designate a “buffer” zone and an “operational” zone, or a vision-based system might be the answer. In both cases, there are now additional sensor inputs to be considered by the controller and additional processing that must be performed in real time.
Consider QorIQ multicore processors
Part of the solution to both questions lies with QorIQ multicore processors, a family of processing platforms ranging from single-, dual-, and quad-core to multicore, all based on 32- or 64-bit Power Architecture cores with integrated double precision floating point. Figure 3 (page 16) shows how programmable control and safety functions could co-exist on a single QorIQ multicore processor. The different functions can run on this architecture using shared or dedicated interconnects, memory, and I/O resources. Hardware enforcement allows the different functions to run without interference from other functions on other cores or on external hosts.
The main element in the safety application is the channel. Here, the hardware implementing a channel requires providing some form of isolation. In the QorIQ multicore processor, the channel would be built from a core complex that is physically instantiated on the device. The core complex has two layers of caching (Layer 1 (I/D cache) and Layer 2 (L2 cache)), which used in a different mode allows local storage. This, combined with the individual cores’ Memory Management Unit (MMU), hardware hypervisor and virtualization support, and the Peripheral Access Management Unit (PAMU) ensure that rights given to a software task are visible to the hardware, thereby allowing enforcement of the isolation.
Hardware hypervisor and virtualization support isolates logical containers
PAMU isolates intelligent peripherals like Ethernet controllers, crypto acceleration, and others in the processor. This effectively allows the viewing of a QorIQ multicore processor as a group of independent CPUs with their own intelligent I/O and memory.
Like any CPU, the core is informed of hardware-detected exceptions in the core complex; for example, single or multiple ECC errors in the core timers or watchdog timers. The core is also made aware of device exceptions beyond the core complex, like ECC errors for internal and external memories or memory violations. All external memory-mapped access to the device is also monitored by the PAMU, which will block and flag unauthorized access attempts. The routing of an exception can be to any number of cores in the device and can be flagged externally. Hence, a channel utilizing a core complex can have exceptions associated with it routed to itself and/or another internal monitor (another core), or an external monitor such as a master on the PCI Express (PCIe) or Serial RapidIO blocks.
In addition to the device detecting the standard hardware errors mentioned above, the QorIQ trust architecture provides the capability to establish a secure trusted node from power on, also known as an encrypted boot. This enables the hardware to validate memory code or static data blocks anywhere on the device to be changed during the normal runtime of the device, creating a hardware method complementary to one of the diagnostic routines typically run in software.
Internal processor communications enable integrated diagnostics
In safety applications, communication between channels is likely to be implemented via a black channel, which in the case of a multicore platform, is internal to the processor. To help with communication inside a QorIQ multicore processor there are three independent communication paths between the cores:
- Shared memory protected by MMU/PAMU
- Hardware queuing system linking individual physical ports to each core complex
- Inter-core messaging registers*
This multicore approach offers many advantages over discrete solutions, allowing for multiple safety channels and consolidation of components or modules. The integration enables improved diagnostic capabilities, which can in turn support better machine availability and productivity.
Freescale Semiconductor, Inc.