Step‑by‑Step Guide to Building a Flash‑Based Bootloader for ARM Cortex‑M Microcontrollers

A tiny bootloader is the unsung hero that lets a board wake up, check its own memory, and hand control over to the main program. In the age of over‑the‑air updates and rapid prototyping, having a reliable flash‑based bootloader on your Cortex‑M can save hours of debugging and a lot of headaches. Let’s walk through a practical, hands‑on build that you can copy into your next project.

Why a Flash‑Based Bootloader Matters Right Now

Most hobbyists start with a simple “run‑from‑flash” model: compile, flash, run. That works for a single demo, but as soon as you need to upgrade firmware in the field, you hit a wall. A bootloader lives in a protected region of flash, can verify new images, and can recover from a bad update. In short, it turns a one‑time demo board into a product that can evolve.

Overview of the Bootloader Architecture

Before we dive into code, let’s sketch the big picture.

Bootloader region – The first few kilobytes of flash, usually locked from accidental erase.
Application region – The rest of flash where your main program lives.
Update source – Could be UART, USB, SPI flash, or even a wireless link.
Verification step – A CRC or SHA‑256 check to make sure the new image is not corrupted.
Jump routine – A small piece of assembly that sets the stack pointer and branches to the application’s reset vector.

Think of it as a tiny train station: the bootloader is the station master, the update source is the incoming train, verification is the ticket check, and the jump routine is the signal that lets the train onto the main line.

Step 1: Reserve the Bootloader Space in the Linker Script

The first thing you need is a clean separation between bootloader and application. Open your linker script (usually a *.ld file) and add a dedicated region at the start of flash.

FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 64K
BOOT (rx)  : ORIGIN = 0x08000000, LENGTH = 8K
APP  (rx)  : ORIGIN = 0x08002000, LENGTH = 56K

Here we keep 8 KB for the bootloader and let the rest be the application. Adjust the sizes to match your MCU’s total flash. The key is that the bootloader must start at address 0, because the Cortex‑M fetches the initial stack pointer and reset vector from there.

Step 2: Create a Minimal Bootloader Project

Start a new project folder called bootloader. Use the same compiler and SDK you use for the main app – that way the same startup files and CMSIS headers apply.

Add a simple main.c:

#include "stm32f4xx.h"   // replace with your MCU header

#define APP_START_ADDRESS 0x08002000U
#define FLASH_PAGE_SIZE   0x400U   // 1 KB for many Cortex‑M parts

/* Simple CRC routine – you can replace with a library */
static uint32_t crc32(const uint8_t *data, uint32_t length)
{
    uint32_t crc = 0xFFFFFFFFU;
    for (uint32_t i = 0; i < length; ++i) {
        crc ^= data[i];
        for (int j = 0; j < 8; ++j) {
            uint32_t mask = -(crc & 1U);
            crc = (crc >> 1) ^ (0xEDB88320U & mask);
        }
    }
    return ~crc;
}

/* Function that jumps to the application */
static void jump_to_application(void)
{
    uint32_t app_sp = *(uint32_t *)APP_START_ADDRESS;
    uint32_t app_reset = *(uint32_t *)(APP_START_ADDRESS + 4U);
    typedef void (*pAppEntry)(void);
    pAppEntry app_entry = (pAppEntry)app_reset;

    __set_MSP(app_sp);   // set main stack pointer
    app_entry();         // branch to app reset handler
}

/* Simple UART receive stub – replace with your driver */
static int uart_receive(uint8_t *buf, uint32_t len)
{
    // placeholder: return 0 on success
    return 0;
}

/* Main bootloader loop */
int main(void)
{
    /* Initialize peripherals needed for update (UART, GPIO, etc.) */
    // init_uart();

    /* Check if a new image is waiting (e.g., a flag in RAM or a command byte) */
    uint8_t cmd;
    if (uart_receive(&cmd, 1) == 0 && cmd == 0x55) {
        /* Receive image size and CRC from host */
        uint32_t img_size;
        uint32_t img_crc;
        uart_receive((uint8_t *)&img_size, 4);
        uart_receive((uint8_t *)&img_crc, 4);

        /* Erase application flash pages */
        for (uint32_t addr = APP_START_ADDRESS;
             addr < APP_START_ADDRESS + img_size;
             addr += FLASH_PAGE_SIZE) {
            // flash_erase_page(addr);
        }

        /* Receive image data and program flash */
        uint8_t buffer[256];
        uint32_t received = 0;
        while (received < img_size) {
            uint32_t chunk = (img_size - received > sizeof(buffer)) ?
                             sizeof(buffer) : (img_size - received);
            uart_receive(buffer, chunk);
            // flash_program(APP_START_ADDRESS + received, buffer, chunk);
            received += chunk;
        }

        /* Verify CRC */
        uint32_t calc_crc = crc32((uint8_t *)APP_START_ADDRESS, img_size);
        if (calc_crc == img_crc) {
            // send ACK to host
        } else {
            // send NACK and maybe stay in bootloader
        }
    }

    /* If no update, jump to existing application */
    jump_to_application();

    while (1) { }   // should never reach here
}

The code above is deliberately simple. It shows the three core tasks: receive a new image, program it, verify it, then jump. Replace the stub functions with the drivers that match your board.

Step 3: Protect the Bootloader Region

Most Cortex‑M MCUs let you lock flash pages from accidental erase. After you program the bootloader, set the write‑protect bits for the first 8 KB. On STM32 devices you can do this with the option bytes, either via the ST‑Link utility or programmatically:

FLASH_OBProgramInitTypeDef ob;
ob.OptionType = OPTIONBYTE_WRP;
ob.WRPSector = FLASH_WRP_SECTOR_0;   // first sector = bootloader
ob.WRPState = OB_WRPSTATE_ENABLE;
HAL_FLASHEx_OBProgram(&ob);

Locking the bootloader ensures that a stray erase command from the application cannot wipe out the recovery code.

Step 4: Build the Main Application to Start at APP_START_ADDRESS

Create a second project for the main firmware. In its linker script, set the flash origin to the same address you used for the application region:

FLASH (rx) : ORIGIN = 0x08002000, LENGTH = 56K

Everything else stays the same. When you compile, the binary will be positioned right after the bootloader.

Step 5: Test the Whole Flow on a Real Board

Program the bootloader first. Use your usual flashing tool (e.g., openocd -f board.cfg -c "program bootloader.elf verify reset exit"). Verify that the bootloader runs and then jumps to the application (if one is present).
Program the application. Flash the app binary to the application region. Reset the board – you should see the app start.
Simulate an OTA update. Connect a PC to the UART, send the command byte 0x55, followed by size, CRC, and the image data. Watch the bootloader erase, program, and verify. If the CRC matches, the bootloader should hand control back to the new image without a full power cycle.

During my first attempt I forgot to set the vector table offset register (VTOR) before jumping. The result? The app started but immediately crashed on the first interrupt. A quick fix was to add SCB->VTOR = APP_START_ADDRESS; right before the jump. That little line saved me a whole afternoon.

Step 6: Add a Fallback Mechanism

A robust bootloader never assumes the new image is good. One common pattern is to keep a small “backup” copy of the previous good firmware in a separate flash sector. If the new image fails CRC, the bootloader can roll back automatically. The logic looks like:

if (calc_crc != img_crc) {
    // restore backup image
    // maybe blink an LED to indicate rollback
}

Implementing a backup adds a few kilobytes of flash usage, but the peace of mind is worth it, especially for remote deployments.

Step 7: Keep the Code Small and Auditable

Because the bootloader runs before any security features, keep it minimal. Avoid pulling in large libraries, and stick to plain C. A small, well‑reviewed bootloader is easier to certify for safety‑critical applications.

Wrapping Up

Building a flash‑based bootloader for an ARM Cortex‑M is not as daunting as it seems. By reserving flash space, writing a concise receive‑program‑verify loop, protecting the bootloader region, and adding a simple fallback, you get a solid foundation for OTA updates, field upgrades, and safer products. The next time you start a new board, give the bootloader a place at the front of the line – it will thank you with fewer field failures.