Step-by-Step Guide to Building a Custom PLD-Based UART in VHDL

Why build your own UART today? Because the built‑in peripherals in many cheap development boards are either hidden behind proprietary libraries or simply don’t match the exact baud rate you need. A small, custom UART on a PLD (CPLD or low‑cost FPGA) gives you full control, teaches you the inner workings of serial communication, and leaves you with a reusable block for future projects. In this post I’ll walk you through the whole process – from defining the interface to getting a working bit on the board – using plain VHDL and a few practical tips from my own lab bench.

What a UART Actually Does

A UART (Universal Asynchronous Receiver/Transmitter) is a tiny state machine that turns parallel data from your logic into a serial stream, and vice‑versa. It adds a start bit, optional parity, and one or more stop bits so the receiver can tell where each byte begins and ends. The key timing element is the baud rate – the number of bits sent per second. All the rest is just moving bits in and out at the right moments.

Choose Your PLD and Toolchain

I usually start with a low‑cost Xilinx CPLD (e.g., XC9500) or a small Spartan‑6 FPGA. Both have enough logic cells for a UART and are supported by the free Vivado or ISE tools. The steps below are tool‑agnostic; just replace the project creation commands with the ones you use.

Step 1: Define the UART Interface

Create a new VHDL file called uart.vhd. The entity should expose the classic signals:

entity uart is
    generic (
        CLK_FREQ   : integer := 50_000_000;  -- system clock in Hz
        BAUD_RATE  : integer := 115200
    );
    port (
        clk        : in  std_logic;
        rst_n      : in  std_logic;
        tx_data    : in  std_logic_vector(7 downto 0);
        tx_start   : in  std_logic;
        tx_busy    : out std_logic;
        rx_data    : out std_logic_vector(7 downto 0);
        rx_ready   : out std_logic;
        rx_error   : out std_logic;
        uart_tx    : out std_logic;
        uart_rx    : in  std_logic
    );
end uart;

Notice the use of generic parameters for clock frequency and baud rate. This lets you reuse the same code on a 100 MHz board or a 25 MHz board without touching the logic.

Step 2: Build a Baud‑Rate Generator

The UART needs a tick that occurs once per bit period. The simplest way is a counter that divides the system clock down to the baud rate.

architecture rtl of uart is
    constant DIVISOR : integer := CLK_FREQ / BAUD_RATE;
    signal baud_tick : std_logic;
    signal baud_cnt  : integer range 0 to DIVISOR-1 := 0;
begin
    process(clk, rst_n)
    begin
        if rst_n = '0' then
            baud_cnt  <= 0;
            baud_tick <= '0';
        elsif rising_edge(clk) then
            if baud_cnt = DIVISOR-1 then
                baud_cnt  <= 0;
                baud_tick <= '1';
            else
                baud_cnt  <= baud_cnt + 1;
                baud_tick <= '0';
            end if;
        end if;
    end process;

The baud_tick signal goes high for one system clock cycle at the start of each bit period. I like to keep it a single‑cycle pulse – it makes the state machines that follow much cleaner.

Step 3: Transmitter State Machine

The transmitter shifts out the start bit, eight data bits, and a stop bit. A simple three‑state machine (IDLE, SEND, STOP) does the job.

    type tx_state_type is (TX_IDLE, TX_SEND, TX_STOP);
    signal tx_state   : tx_state_type := TX_IDLE;
    signal tx_shift   : std_logic_vector(9 downto 0);
    signal tx_bitcnt  : integer range 0 to 9 := 0;

    uart_tx <= '1';  -- idle line is high

    process(clk, rst_n)
    begin
        if rst_n = '0' then
            tx_state  <= TX_IDLE;
            tx_busy   <= '0';
            tx_shift  <= (others => '1');
            tx_bitcnt <= 0;
        elsif rising_edge(clk) then
            case tx_state is
                when TX_IDLE =>
                    tx_busy <= '0';
                    if tx_start = '1' then
                        tx_shift  <= '0' & tx_data & '1';  -- start, data, stop
                        tx_bitcnt <= 0;
                        tx_state  <= TX_SEND;
                        tx_busy   <= '1';
                    end if;
                when TX_SEND =>
                    if baud_tick = '1' then
                        uart_tx <= tx_shift(0);
                        tx_shift <= '1' & tx_shift(9 downto 1);
                        if tx_bitcnt = 9 then
                            tx_state <= TX_STOP;
                        else
                            tx_bitcnt <= tx_bitcnt + 1;
                        end if;
                    end if;
                when TX_STOP =>
                    if baud_tick = '1' then
                        uart_tx <= '1';
                        tx_state <= TX_IDLE;
                    end if;
            end case;
        end if;
    end process;

A quick anecdote: the first time I wrote this block on a CPLD, I forgot to set the idle line to ‘1’. The result was a constantly low line that looked like a stuck key on my terminal. A good reminder that UART is “idle high” – a tiny detail that can save hours of debugging.

Step 4: Receiver State Machine

Receiving is a bit trickier because you must detect the start bit, then sample the data bits in the middle of each bit period. The classic approach is to use a counter that waits half a baud tick after seeing a falling edge, then samples every full tick.

    type rx_state_type is (RX_IDLE, RX_START, RX_DATA, RX_STOP);
    signal rx_state   : rx_state_type := RX_IDLE;
    signal rx_shift   : std_logic_vector(7 downto 0);
    signal rx_bitcnt  : integer range 0 to 7 := 0;
    signal sample_cnt : integer range 0 to DIVISOR-1 := 0;
    signal sample_en  : std_logic := '0';

    process(clk, rst_n)
    begin
        if rst_n = '0' then
            rx_state   <= RX_IDLE;
            rx_ready   <= '0';
            rx_error   <= '0';
            sample_cnt <= 0;
        elsif rising_edge(clk) then
            case rx_state is
                when RX_IDLE =>
                    rx_ready <= '0';
                    if uart_rx = '0' then               -- start bit detected
                        sample_cnt <= DIVISOR/2;       -- wait half bit
                        rx_state   <= RX_START;
                    end if;
                when RX_START =>
                    if baud_tick = '1' then
                        if uart_rx = '0' then           -- still low, valid start
                            sample_cnt <= 0;
                            rx_bitcnt  <= 0;
                            rx_state   <= RX_DATA;
                        else
                            rx_state <= RX_IDLE;        -- false start
                        end if;
                    end if;
                when RX_DATA =>
                    if baud_tick = '1' then
                        rx_shift(rx_bitcnt) <= uart_rx;
                        if rx_bitcnt = 7 then
                            rx_state <= RX_STOP;
                        else
                            rx_bitcnt <= rx_bitcnt + 1;
                        end if;
                    end if;
                when RX_STOP =>
                    if baud_tick = '1' then
                        if uart_rx = '1' then
                            rx_data  <= rx_shift;
                            rx_ready <= '1';
                            rx_error <= '0';
                        else
                            rx_error <= '1';            -- missing stop bit
                        end if;
                        rx_state <= RX_IDLE;
                    end if;
            end case;
        end if;
    end process;

The receiver sets rx_ready high for one clock cycle when a full byte is captured. In a larger design you would typically latch this into a FIFO, but for a simple demo the flag is enough.

Step 5: Top‑Level Integration

Now wrap the UART into a top‑level module that connects to the PLD pins and a simple testbench. I like to expose a single tx_data register and a tx_start pulse, plus a rx_data output that the rest of the design can read.

entity uart_top is
    port (
        clk      : in  std_logic;
        rst_n    : in  std_logic;
        tx_data  : in  std_logic_vector(7 downto 0);
        tx_start : in  std_logic;
        rx_data  : out std_logic_vector(7 downto 0);
        rx_ready : out std_logic;
        uart_tx  : out std_logic;
        uart_rx  : in  std_logic
    );
end uart_top;

architecture rtl of uart_top is
begin
    u_uart : entity work.uart
        generic map (
            CLK_FREQ  => 50_000_000,
            BAUD_RATE => 115200
        )
        port map (
            clk      => clk,
            rst_n    => rst_n,
            tx_data  => tx_data,
            tx_start => tx_start,
            tx_busy  => open,
            rx_data  => rx_data,
            rx_ready => rx_ready,
            rx_error => open,
            uart_tx  => uart_tx,
            uart_rx  => uart_rx
        );
end rtl;

Compile the design, run a quick behavioral simulation (I use ModelSim), and verify that a transmitted byte appears on rx_data after the expected number of clock cycles. If the simulation looks good, move on to synthesis.

Step 6: Synthesize and Assign Pins

In Vivado, create a new project, add the VHDL files, and set the target device to your CPLD/FPGA. Run synthesis – the UART uses only a few hundred LUTs, so even the smallest CPLD can handle it.

Next, open the I/O Planning view and assign uart_tx and uart_rx to the physical pins you will connect to a USB‑to‑TTL adapter or a simple LED‑based loopback. Remember to set the I/O standard to LVCMOS33 (or whatever your board uses).

Step 7: Load and Test on Hardware

Program the PLD with the generated bitstream. Hook up a USB‑to‑TTL cable: connect the cable’s TX to uart_rx and the cable’s RX to uart_tx. Open a terminal program (e.g., PuTTY) at 115200 8N1. When you press a key, the terminal sends a byte; the PLD receives it, asserts rx_ready, and you can echo it back by asserting tx_start with the same rx_data. You should see the character appear twice – once from the host and once from the PLD.

If you encounter framing errors (rx_error high), double‑check the baud‑rate divisor and make sure the clock frequency matches the CLK_FREQ generic. A common mistake is forgetting to account for the PLL’s multiplication factor when you generate a 100 MHz clock from a 50 MHz crystal.

Step 8: Extend the Design

Now that the core UART works, you can add features:

  • Parity – add a single parity bit in the shift registers.
  • FIFO buffers – smooth out bursts of data.
  • Multiple baud rates – expose a selector that changes the divisor on the fly.
  • Hardware flow control – RTS/CTS lines for high‑speed links.

Each addition follows the same pattern: define a clear state machine, keep the code modular, and test with a small simulation before loading to hardware.

Wrap‑Up Thoughts

Building a UART from scratch is a great way to demystify serial communication and to get comfortable with VHDL state machines. The whole project fits comfortably on a low‑cost CPLD, leaving plenty of room for other logic you might need – say, a simple SPI flash controller or a tiny soft‑core CPU. The next time you need a custom baud rate or a special framing format, you’ll already have a tested block you can drop into any PLD design.

Happy coding, and may your bitstreams always synthesize on the first try!

Reactions