Custom MBR/bootloader for NRF52

The official Nordic DFU bootloader is nice, but huge, on smaller devices like NRF52811 it takes a large portion of memory, combined with a softdevice, only few kB of Flash are left for the application. But what if I don't need a full blown secure wireless bootloader and a tiny uart bootloader is all I want? And what if I could squeeze it into the 4kB MBR section of the softdevice?

I was faced with developing a product that runs a main beefy MCU for all the hard work and a small Nordic NRF52 based radio for the wireless connectivity to nearby devices. The problem was that the radio firmware was not to be ready for release before the product mass production release date. The idea was to ship the device with only a bootloader in the radio chip and update it later together with the main CPU firmware. It was not decided, whether the nRF firmware would be based on the NRF5 SDK everyone is familiar with, or on the shiny new Zephyr based nRF Connect SDK that takes like an hour just to download and build a blinky example.

The official NRF5 SDK bootloader example kind of worked, but it takes like 30 kB of Flash for bootloader, 8kB for configuration, add a 100+kB softdevice (binary blob from Nordic with the BLE stack) and you are left with like 40 kB of Flash for application, make it 8 kB when using a larger softdevice. After removing all the firmware signing / crypto code and leaving only UART protocol, I was able to shave the bootloader down to like 15 kB which is still quite a lot! I've also hit few walls trying to convince the bootloader to write just the application (no softdevice - fw for Thread radio protocol) over app+softdevice, etc. And I'm not talking about Zephyr, that would probably add another layer of insanity.

So, the idea was to avoid the Nordic bootloader completely and squeeze all the fw upgrade related code to the initial 4kB of the Flash that are usually occupied (on NRF5 SDK based firmware) with a binary blob from Nordic called MBR - 38 kB of Flash saved with almost no loss of functionality (well, image signing is still a nice thing, but that can be offloaded to the main MCU). The custom MBR wouldn't care what data are written to flash, so app, app+softdevice, zephyr app and any other weird combinations of binary blobs are not a problem - let's future proof this crap.

The only issue is that the MBR does a little more than jumping to app or bootloader when requested... Let's dive deeply into the mighty Nordic binary blob called the MBR.

NRF5 Flash Layout

The NRF5 SDK based app usually ends up with this Flash layout:

layout

MBR: Usually a part of a softdevice, but can be flashed without it. It handles switching between application and bootloader, fw update finalization (e.g. copying bootloader to correct place after bootloader update),...
SoftDevice: A binary blob with Nordic proprietary BLE stack, always starts at 0x1000
App: User application, start address of the application depends on the softdevice used and it's mentioned in the given softdevice documentation
Bootloader: Optional, it handles firmware updates (e.g. over BLE, UART,...), start address is arbitrary.

NRF5 Boot process

Based on the disassembly of the MBR (btw, the Ghidra is a very good tool for such work), browsing the documentation, forums and google results, I came up with the following boot process (for MBR from the latest 17.1.0 SDK.

The MBR is just an ordinary ARM code, the MCU loads reset vector from the vector table at 0x0 and passes execution to it
The MBR attempts to load a configuration MBRPARAMADDR, address of this area is ether stored at MBR_PARAM_ADDR (defined in nrf_mbr.h - 0xffc), if this address contains 0xffffffff, the value at UICR.NRFFW[1] is used instead if valid. This configuration basically contains details needed for copying bootloader image to correct place after bootloader update, etc, it is only written by the MBR. If not found, the MBR ignores it and continues
The MBR attempts to find the start of the bootloader address, the BOOTLOADERADDR. It can be found either at the MBR_BOOTLOADER_ADDR (defined in nrf_mbr.h - 0xff8) or in UICR.NRFFW[0], it bootloader address is found, MBR passes execution to the bootloader and the bootloader later passes execution to reset vector at 0x1004 (0x1000 is initial stack pointer)
If bootloader is not present, the execution is passed to the reset vector at fixed address 0x1004
The 0x1004 contains reset vector either of the softdevice or application when softdevice is not present, the softdevice does some hidden dark magic and finally passes execution to the application.

The MBR_PARAM_ADDR and MBR_BOOTLOADER_ADDR are usually written by the bootloader when it starts for the first time. When flashing the MBR/softdevice and bootloader to a fresh MCU, the UICR.NRFFW[0] register is written with the bootloader address (check the bootloader linker script and the resulting .hex file, the register write is defined there) so the MBR knows where to find the bootloader although the MBR_BOOTLOADER_ADDR is not set yet.

Interrupts forwarding

The MBR is responsible for passing the execution to app/softdevice/bootloader, based on configuration, but it also does another dark magic stuff. Let's ignore the fw upgrade support related code as that's not needed for the custom bootloader and focus on the bare minimum - the interrupts forwarding.

Before executing the reset vector of the application, the MCU needs to switch the vector table address in the MCU registers, so the correct application functions are called when an interrupt is triggered. This can be done by modifying the VTOR register with a new vector table address. On olders/smallers MCUs with cores like Cortex M0, the VTOR is not available, so another hack must be used - the new vector table is copied to start of the RAM and the RAM is relocated to 0x0, effectively changing the vector table to the new one. On even dumber MCUs, the bootloader must implement all possible interrupt handlers to catch them all and forward them to the application manually.

The NRF5 SDK still works on older Cortex M0 based MCUs, so VTOR is not always available, additionally some interrupts are to be processed by the softdevice, some by the application, so it needs a way to select to where the interrupt needs to be forwarded based on its type. And this is when the MBR steps in, the interrupt forwarding is designed like this by Nordic:

The 0x20000000 (start of RAM) contains the address where the target vector table is (softdevice, bootloader or app)
The MBR catches all interrupts, the execution is then passed to function pointed by address *((uint32_t *)0x20000000) + interrupt_id*4, usually softdevice, that either processes the interrupt or forwards it to the application
The SVCall interrupt is processed in a different way, if the SVCall ID (passed in SVCall instruction) is 0x11, then it's processed by MBR (request for MBR), else it's forwarded like the rest

Custom MBR

To make everything working when using a custom MBR, the MBR must:

Forward all interrupts to vector table stored at address defined in 0x20000000. It should probably catch the SVCall to MBR, but without Nordic bootloader in place, it's probably not strictly necessary
Set first 4 bytes (1 uint32_t word) at 0x20000000 to 0x1000
Load MSP with data at 0x1000
Jump to address stored in 0x1004 (reset vector)

Putting it all together results in something like this:

__attribute__((naked)) static void Int_Handler(void)
{
    __asm("ldr r1, =0xE000ED04\n" 
          "ldr r0, [r1]\n"  // load SCB->ICSR
          "ldr r1, =0x1ff\n"  
          "and r0, r0, r1\n" // mask VECTACTIVE bits
           "lsls r0, r0, #2\n" // multiply interrupt id by 4
           "r1, =0x20000000\n"
          "ldr r2, [r1]\n"
          "add r2, r2, r0\n"  
          "ldr r3, [r2]\n"  // load pointer at *((uint32_t *)0x20000000) + int_id*4
          "bx r3\n" // pass execution to app/softdevice handler 
    );
}

static void Reset_Handler(void)
{
    // TODO you shall initialize .data and .bss segments here, pretty standard code

    uint32_t *table = (uint32_t *)0x20000000;
    *table = 0x1000;

    __set_MSP(vectorTable[0]);
    ((void (*)(void))table[1])();
}

const uint32_t* vectors[] __attribute__((section(".isr_vector"), used)) = {
    &_estack,
    Reset_Handler,
    [2 ... 99] = Int_Handler,
};

This is a very basic implementation, but that's all that's needed to implement a tiny MBR that will work with both bare app and softdevice based system. Each interrupt takes 10 instructions on top of the actual application handler code and no, it's not possible to use the VTOR if you intend to use the softdevice, it seems to be changing the address stored at the 0x20000000 sometimes and I always ended up at some fixed address inside the softdevice.

Conclusion

Take the code above, add some basic uart code, flash writing and some DFU protocol and you've made yourself a nice bootloader! Just make sure it fits into the first 4 kB of flash and disable all the peripherals you've enabled before jumping to the application. I've managed to squeeze everything including a custom uart DFU protocol, 16 bit CRC of the written data and some minor debug functionality into 3 kB of the flash, leaving me with most of the Flash memory for the application.

The custom MBR implementation shall be able to run anything you place at 0x1000, a barebone application, application with softdevice bundled, a Zephyr based application,... There are just few details to remember:

The application must be linked with flash start at 0x1000 (or address required by softdevice if you use it)
First 4 bytes of RAM (0x20000000 - 0x20000003) are reserved for MBR and must not be written
All interrupts are delayed by 10 instruction on top of the standard MCU processing time

Resources

Github project nrf-dfu - this is the very foundation of this blog post. The initial though something like this is possible came from here and I've use it as reference while dissing in the MBR disassembly. Not really up to date with the latest NRF5 SDK MBR implementation, but it still helped a lot.
NRF Softdevice internals - Blog with a details of the MBR internals, from the nrf-dfu author.