Security and the Cortex-M MPU, part 5: Step-by-step MPU security
February 06, 2017
Previous blogs have presented an introduction to the MPU and terminology, MPU multitasking, defining MPU regions, and a software interrupt (SWI) API f...
Previous blogs have presented an , , , and a . In the first blog, privileged tasks (
ptasks) and unprivileged tasks (
utasks) were defined. The former run in privileged thread mode and the latter run in unprivileged thread mode. The mode of a task is determined by the
umode flag in its TCB and takes effect when it is dispatched by the real-time operating system (RTOS) scheduler.
This blog presents a step-by-step procedure to provide memory protection unit (MPU) security to late- and post-project systems. It, of course, can also be applied to new projects. The goal is to achieve the reliability, security, and safety that modern embedded systems require, as they become connected to the Internet of Things (IoT).
The discussion that follows assumes the SMX RTOS and EWARM tool suite for the sake of specificity. However, the MPU-Plus software package can be ported to other RTOSs and tool suites. Although the following may look like a cookbook recipe, it is intended as a feasibility demonstration of adding MPU security to an existing product.
To start, it is assumed that
xmpu_iar.c have been added to the RTOS library and that
mpu.c has been added to the application project. Add a call to
sb_MPUInit() near the beginning of the startup code, and temporarily disable loading
 in it. This turns on the MPU and enables its background region. Your application should run normally.
2. System regions
sys_code should contain all handler and interrupt service routine (ISR) shell code. If an ISR does not use a shell, then the ISR itself must be included. This is done as in the following examples for assembly code:
and for C code:
The background region (BR) macros are discussed in the next section.
sys_data contains the system stack (also called the main stack by ARM Ltd.). Then, in the linker command file:
Of course, the actual sizes depend upon the application. They should be the next power of two that is large enough (if it’s not large enough, the linker will complain). The alignment must equal the size. Now enable loading
sb_MPUInit(). These are permanent regions that are present for every task, and they allow privileged access only. Hence, they are not accessible by
3. BR switching
Unfortunately, the MPU does not have enough slots to serve
ptasks, handlers, and ISRs. ARM Ltd. added the background region to permit
pcode to access all memory, eliminating the need for
pregions. However, using BR undercuts isolation and protection for
ptasks. Therefore, we have developed a technique of switching BR on for handlers and ISRs and off for tasks.
.sys_data regions allow exceptions and interrupts to be serviced up to the point where
MPU_BR_ON() turns on the background region. Then, all necessary code and data can be accessed by the handler or ISR to perform its function. When done,
MPU_BR_OFF() turns off the background region if
mpu_br_off is on. At this point in the conversion,
mpu_br_off is always off, so BR is always enabled. Hence, the application should still run normally – nothing has changed. These macros add a total of 13 cycles, worst case, to each handler and ISR.
4. Super regions
The next step is to define super regions for SRAM, ROM, and DRAM for your system. These serve as temporary replacements for BR until task-specific regions are defined. Consult the linker map to determine the starting address and how much memory is being used in each of these memory areas. Then, pick the next larger power of two for the size. The following template is an example for an existing system:
Super regions encompass all other regions in their memory areas. Hence, it is simpler to use physical addresses and sizes as shown above.
This template is loaded into the memory protection array (MPA) for each task, after the task is created. When a task’s MPA is loaded, its
mpav flag is set. When a task is started, or resumed, the global flag,
mpu_br_off, is set if
1, or it is reset if
PendSV_Handler() turns BR off before starting the current task if
mpu_br_off is set, else BR is left on. Note that this handler is running in the
sys_code region, so it does not need BR. Hence, a task with
mpav on runs only in the super regions and its own task stack.
If a task gets a memory manage fault (MMF), then the task needs access to something else, such as a peripheral. In this case, put the additional region into
MPA, as shown above. Or, merely disable loading
mpa_tmplt_app for that task so it will run with BR and deal with the problem later.
Note that whenever an exception or interrupt occurs, BR is turned on for the handler or ISR, then BR is left on or turned off when it is done, depending upon
mpu_br_off and whether the handler or ISR is nested, in which case BR is left on.
At this point, it is desirable, but not necessary, to have all tasks running with BR off.
5. Cropping region sizes
Using subregions allows tightening region boundaries and reducing wasted memory. In the example above,
MPA is 256 KB, but actual SRAM used is 210 KB. The subregion size is 32 KB. Cropping subregion 7 (
N7 above) reduces
MPA size to 224 KB, which is big enough. Only 14 KB is wasted – still a lot, but security costs!
A significant gain has been made at this point: handlers and ISRs are running as they were before, but all or most tasks are running in reduced memory spaces with strictly controlled attributes (e.g. read only (RO), execute never (XN), etc.) This is likely to reveal errors you didn’t know you had. In addition, large unused memory areas are protected from access by wild pointers and malware.
6. Task-specific regions
The next step is to identify the most untrusted or vulnerable task or group of tasks that you wish to isolate from the rest of the system. This might be a networking subsystem or third-party code. We recommend an incremental approach to improving system security. Significant gains can be made by isolating one bad actor at a time, and as you go, your skill at this will improve. So, start easy. For simplicity, in the following discussion, we will assume that a single task,
taskA, is being isolated. See previous blogs for how to define sections, blocks, templates, MPAs, etc.
The first step is to group code and data into task-specific regions and to define blocks in the linker command file to hold these regions. These regions are separated from the app regions defined previously. It is convenient to name them after the task (e.g.,
taskA_data). If not already the case, it may be helpful to put all task-specific code into a single module and it may also be helpful to put task-specific data, if any, at the start of the same module.
Next, define common code and data regions to hold RTOS and other system services and to hold common data needed by them. These might be named
pcom_data, respectively. At this point,
taskA is a
pcom_code needs to include RTOS and other system services needed by
pcom_data needs to include data needed by these services.
mpu_tmplt_taskA and add code to load it into the MPA for
taskA. This code is normally placed after the
smx_CreateTask() call for the task. At this point, the
mpa_tmplt_app has been replaced by
mpu_tmplt_taskA for this task.
taskA is standing alone and is partially isolated from all other tasks. Will it run? This is where the tire meets the road. Memory manage faults (MMFs) from
taskA are likely to be due to references outside of its regions or attribute violations (e.g., writing to ROM.)
The C-SPY debugger is helpful in tracking these down. Put a breakpoint at the start of the
MMF_Handler() so that execution will stop immediately on an MMF. In the Registers window, open System Control Block. The CFSR register shows the causes of all faults (see ARM Application Note 209). The PC register points to a faulting instruction. The MMFAR register shows the address of a data violation. The Memory Protection Unit in the Registers window allows looking at the MPU. To see a specific slot, enter its number into
RASR will display the desired region in easy-to-read form.
Solving MMFs may consist, in many cases, of just moving
taskA-specific code and data into
.taskA_data regions, respectively. Assigning regions to tasks is task-specific. Some tasks may not need some of the standard regions but may need other regions, such as I/O regions.
MPU is tentatively reserved for a system region common to all tasks that would hold common subroutines (e.g., a C library), common tables, text strings, etc. This would be a read-only region so it should be safe from tampering.
However, if there are not enough task regions,
MPA_SIZE may be increased to
6, which results in all MPAs having six regions. Another alternative is to split tasks into smaller tasks that require fewer regions. For example, a task doing both input and output might be split into an input task and an output task, linked by a message exchange or a pipe. Then, the input task requires only an input region and the output task only an output region.
The final step is to move
umode. This is done by setting its
umode flag. Now when it is dispatched,
PendSV_Handler() will set
CONTROL = 0x3, which causes the processor to run in unprivileged thread mode using the task’s stack. In addition, add
#include xapiu.h ahead of the task’s code in its module. This forces the software interrupt (SWI) application programming interface (API) to be used for RTOS service calls and possibly other system service calls.
Before actually running
mpu_tmplt_taskA must be changed.
taskA_data, and the
taskA stack regions should stay the same. However, replace the
pcom regions with
ucom_data. The first contains the system service shells in
xmpu.c. You may need to move routines from
ucom_code and move data from
ucom_data. This may not be possible if these routines and data are used by other
ptasks. Solving this problem may require remedies such as:
- Simultaneously converting other
- Moving common routines to
MPUand making it accessible in
- Replicating common routines with different names.
- Passing global values via messages or pipes.
taskA first starts running as a
utask, you are likely to see
PRIVILEGE VIOLATION errors indicating that restricted service calls are being made. This may necessitate recoding to not use these services. Or, it may work better to split
taskA into a
ptask, which calls these services (e.g.,
TaskCreate()) and a
utask, which does not. Alternatively,
taskA could start as a
ptask, make all of the restricted service calls, then restart itself as a
utask (it must restart itself so that the
PendSV_Handler() will change
Once you get
taskA running as a
utask, you have a task which cannot harm critical system resources. It can only access its own code, data, and stack, plus common code, which may consist only of system service shells, and possibly only common data shared with other
utasks in its subsystem. This final solution may not be perfect, but it is a big improvement over doing nothing.
8. Final tuning
If all has gone well, all untrusted code is running in
utasks and trusted code is running in
ptasks are isolated and protected from each other, as well as from
utasks. Unfortunately, there is a chink in the armor: handlers and ISRs can access everything via BR. Handlers are internal and therefore relatively trustworthy. Of particular concern are ISRs, which can be manipulated by hackers from outside of the system. It is therefore desirable to move ISRs entirely within the
sys_code region, if possible, and to not enable BR for them. If this is not possible, try to do most of their work in tasks – preferably
Unfortunately, there is no way to block access to the private peripheral bus in privileged mode. Hence, if malware can gain access to
pmode, it can change the MPU and then access anything it wants. The strongest protection is to move as much code into
umode as possible, reducing vulnerability to a small amount of code that can be fortified.
The case for
utasks is obvious, but what about
ptasks? The following are reasons why
ptasks may be necessary:
- Avoid changing highly-trusted, critical software with a high debug investment
- Better performance
- Direct access to all operating system (OS) and board support package (BSP) services
- Direct access to hardware
- Stepping stones to
There is a step-by-step process to incrementally improve the security of Cortex-M embedded systems that have MPUs, and this process can be performed on already-released systems. Although some recoding and restructuring may be necessary, it is likely to be minor, and there are many remedies for problems that arise. Furthermore, proceeding in an incremental manner with frequent testing helps to ensure that new bugs are found and fixed as soon as they are introduced. Being able to easily shuttle tasks between
pmode helps further in tracking down problems.