Security and the Cortex-M MPU, part 4: SWI API for MPU systems

By Ralph Moore

President

Micro Digital

January 19, 2017

Security and the Cortex-M MPU, part 4: SWI API for MPU systems

  The Cortex-M v7 memory protection unit (MPU) is difficult to use, but it is the main means of hardware memory protection available for Cortex-M3, -M4, and -M7 processors[1]. These processors are...

 

The Cortex-M v7 memory protection unit (MPU) is difficult to use, but it is the main means of hardware memory protection available for Cortex-M3, -M4, and -M7 processors[1]. These processors are in widespread use in small- to medium-size embedded systems. Hence, it is important to learn to use the Cortex-M v7 MPU effectively in order to achieve the reliability, security, and safety that modern embedded systems require.

Previous blogs have presented an introduction to the MPU and terminology, MPU multitasking, and defining MPU regions. In the first blog, privileged tasks (ptasks) and unprivileged tasks (utasks) were defined. The former run in privileged thread mode and the latter run in unprivileged thread mode[2]. The mode of a task is determined by the umode flag in its task control back (TCB) and takes effect when it is dispatched by the real-time operating system (RTOS) scheduler. The umode flag can be set in pmode at any time after creating the task.

ptasks can directly call system services, but utasks cannot. There are two reasons for this:

  1. To protect the RTOS and its data from the less-trusted software in utasks. This may be software of unknown pedigree (SOUP), or it may be vulnerable to malware (e.g., a TCP/IP stack).
  2. To limit the RTOS services that this software can use. It is undesirable for utasks to be able to perform operations that can harm normal system operation, such as power off or task delete.

This blog discusses the mechanisms by which the foregoing protections are achieved. It should be noted that a principal objective of MPU security is to put as much application code as possible into utasks.

utask MPA regions

Each task has its own memory protection array (MPA), which is initialized from an MPA template. A typical MPA template for a utask (e.g., ut2a) is as follows:

 

This template is loaded into the MPA for the task after the task is created, and then into the MPU when that task is dispatched. The region for the task stack is defined and put into MPA[0] when the task is first started. So, the above utask has access to its own stack, to its own code and data regions, to common code and data regions, and to nothing else. Regions 5, 6, and 7 are either disabled or privileged. Hence, this utask is prevented from accessing system services and data directly. The latter is true for all utasks, though their templates may differ.

umode services

utasks must use a software interrupt (SWI) application programming interface (API) to access system services, and they can never access system data directly. In addition, only unrestricted system services can be accessed by utasks. These barriers help to protect the operating system (OS) from untrusted code.

The SWI API is implemented via the Cortex-M SVC instruction “SVC n“, where n specifies the system service to be performed.

For the smx RTOS kernel, the header file, xapi.h, contains the prototype functions of all smx services. Including this file at the start of pcode allows it to access any of them. For ucode, xapiu.h is defined. It consists of mapping macros for system services that are permitted in umode. For example:

This macro overrides the function prototype in xapi.h so that for the rest of the application module, rather than calling the system service directly, it calls a shell function instead:

 

 

This shell function serves to call the SVC instruction with n == ID of the system service. NI (Not Inline) is a macro that blocks function in-lining by the compiler. Note that the shell function has the same name, except that its prefix is smxu_ instead of smx_. Shells are in the ucom_code region so that they are accessible by utasks.

An application module can start with pcode followed by ucode, or it may be entirely pcode or ucode. Either way, the ucode is prefaced with:

No system services can be directly called after that point. All of the above is done at compile time and thus becomes hard-coded and therefore resistant to malware and bugs, especially if the code is located in ROM.

Mixed pcode/ucode modules are convenient because a functional section of a system will typically have a root task, which is a ptask that creates, initializes, and starts all other tasks for the section, most of which may be utasks. Thus, all related tasks can be kept together. Inherent in this is the idea that some tasks of a system section might be carefully-constructed tasks that perform mission-critical functions. These tasks would probably be ptasks. Other tasks of the system section might be performing non-critical functions, such as gathering statistics to be sent to the cloud. These would be utasks.

Some tasks might start their existence as ptasks and be migrated to utasks as a project develops. It is typically easier to debug code in pmode then move it to umode. Also, tasks can start as ptasks and execute pcode, set their own umode flags, then restart themselves as utasks and execute ucode.

Another interesting feature is that multiple xapiu.h files can be deployed and used by different utasks. This allows for different levels of trust. Thus, more-trusted tasks can be given access to more RTOS services than less-trusted tasks. This permits tightening the noose on SOUP or highly vulnerable tasks. However, this only works at compile time. To protect against malware, it is also necessary to have different jump tables (see Figure 5 below) corresponding to the different xapiu.h files and a mechanism to select the jump table per task.

utask service call mechanism

The basic concept of a software interrupt API, as presented above, is pretty simple. But when the called system service might cause a task switch, things get more complicated – particularly for the Cortex-M architecture, which requires that the RTOS scheduler reside inside of the PendSV_Handler. Also complicated is that handlers run in privileged mode and use the System Stack (SS)[3] instead of the current Task Stack, TS.

As shown in the following diagram, SVC_Handler() is invoked by the SVC instruction and runs in handler mode:

When SVC_Handler() starts running, the system service parameters are in TS due to stacking by the processor. The handler moves parameters 0 thru 3 into r0 thru r3 and it moves the 5th parameter, if any, into the top of the system stack, SS (this is where the system service expects to find these parameters). SVC_Handler() then calls the system service (SSR) via the ssrt[] jump table, using the index n (ID) passed to it (see above).

The system service executes normally and returns to SVC_Handler(), which moves the system service return value from r0 to its correct position in TS. The handler return operation, performed by the processor, unstacks all stacked registers in TS, thus the return value ends up in r0.

If the system service has resulted in the need for the task scheduler to run (sched > 0) or an interrupt has resulted in a need for the link service routine (LSR) scheduler to run (lqctr > 0), the PendSV_Handler() will have been pended. In this case, the processor tail-chains[4] from SVC_Handler() to PendSV_Handler (shown by the dotted line in the diagram), instead of returning to the utask.

In this case, control does not go back to the point of call in the utask yet, but rather to the scheduler running inside of PendSV_Handler(). This may result in the current task being suspended and another task being resumed to run instead (shown to the right in the diagram). The preempting task can be either a utask or a ptask. Eventually the suspended utask will be resumed, unstacked, and continue running from the point of call, provided that it was not stopped nor deleted by a preempting task. (Note: All of the above is done in privileged mode and thus is protected from malware that has infected ucode.)

ptask service call mechanism

By contrast, the following diagram shows operation when a system service is called from a ptask.

Note that this is much simpler (and faster): SVC_Handler() is not involved. The ptask calls the system service directly, and if sched is set, PendSV_Handler() is pended. From there, operation is identical to that for a utask.

Cross operation

System services operate the same regardless of whether they are invoked from utasks or ptasks. For example, a utask may test a semaphore and become suspended upon it. A ptask may signal the semaphore and the utask will resume. Or vice-versa. A ptask may have higher or lower priority than a utask and the scheduler will dispatch it according to its priority (privilege has no priority here!). What is different is that the ptask executes trusted code (pcode) and usually has full access to memory, peripherals, and system services, whereas the utask executes unprivileged code (ucode) and has access to only what the MPU permits. Furthermore, the MPU can only be changed by pcode.

Lest there be concern that ptasks are unbridled agents, note that it is possible to prevent access to a region via the MPU even though the background region is enabled. Hence, a region that is read/write (RW) to one task could be read only (RO) to another and execute never (XN) to both. On the other hand, ptasks do have direct access to all smx services. As security of a system is tightened, consideration should be given to limiting ptasks as well as utasks.

Next:

  • Porting Existing Applications to an MPU

For more information on the MPU software architecture, see previous blogs:

Additional information can be found at www.smxrtos.com/mpu.

Ralph Moore, President and Founder of Micro Digital, graduated with a degree in Physics from Caltech. He spent his early career in computer research, then moved into mainframe design and consulting.

Micro Digital

www.smxrtos.com/mpu

[email protected]

 

I am no longer running the daily business at Micro Digital. Instead, I have been involved for the past four years in improving the smx RTOS kernel. smx is a hard-real-time multitasking kernel, which is intended for embedded systems that require high efficiency and high performance.

More from Ralph

Categories
Processing