DaftSoft: The real «Hello World» from embedder (2): practice, prepare

Between theory and code

In the previous post we've reviewed bare-metal development: what it is, the theory of it, its workflow, what information we need and where to get this information, its profits, bottlenecks and limitations. This — how SoC is organised and it works, most of all somehow know from theory (school or university) and/or practice (working on a high level or via some HAL). The next post will describe the well known to all of us process — coding. But what is between that areas of theory and practice? What is between the knowing how PLLs and clocks work, knowing that UART is configured by writing some values to some registers and process of writing device-tree nodes and calling functions of HAL? How does that magic of making certain SoC functioning really arise? The process of preparing to code is described in this post — the practice of getting information, gathering it and planning jobs. Excuse me for not feeling sorry for you — not a single line of code will be written in this post, but I will describe this (middle) part of job up to the single bit. This is to let you know clear enough how it is done.

The plan

The plan we need to carry out to reach our goals looks like:
1. Choose the SoC we will work on and get its documentation.
2. Find out the condition BROM leaves SoC in, and see what is initialised for us and what is not.
3. Find out what exactly PLLs and clocks we have to configure to start up hardware-blocks.
4. Configure UART TX and RX pads.
5. Configure UART hardware-block.
6. Output some string.
7. Finish with infinite loop outputting characters received from UART.
8. Make proper boot image and put it in the place BROM of our SoC expects it to be.

The practice

Let's get it started.

1. Choose the SoC we will work on and get its documentation.

We will work on NXP i.MX 8M Plus (iMX8MP). This is multi-core (mine is quad-core) ARM Cortex-A53 (ARMv8-A), SoC (with additional Cortex-M7 core). Some kind of what we need and interesting to play with. Remember and bear in mind what we were talking about in previous post — bare-metal development is strongly tied to certain SoC. Thus, if you are about to develop stand-alone application for any other SoC, then the practice, we will do in this post, is not for you. You can read it as an example of workflow only. Once, we have chosen SoC, we download datasheet describing it. In our case it is «i.MX 8M Plus Applications Processor Reference Manual» (IMX8MPRM.pdf, I have Rev. 3, 08/2024). And, to entertain you a little, here is the SoC itself:

NXP iMX 8M Plus

2. Find out the condition BROM leaves SoC in, and see what is initialised for us and what is not.

To find out the condition BROM leaves SoC in we look for section, which describes what BROM enables and what does not. This is section 6.1.4.2 — «Boot block activation» (p. 706). This section claims that BROM of iMX8MP activates (in addition to some others) these blocks: Clock Control Module (CCM), Input/Output Multiplexer Controller (IOMUXC). We will boot from SD-Card, thus Ultra-Secure Digital Host Controller (USDHC) will be enabled also (but this is obvious). Let's proceed to clocks BROM has initialised for us. In the next section 6.1.4.3 — «Clocks at boot time» (p. 707), table 6-3 — «PLL setting by ROM» we see which PLLs are enabled: ARM PLL at 1GHz, System PLL1 at 800MHz and System PLL2 at 1GHz. Let's remember PLLs we have enabled: System PLL1 and System PLL2. We don't care about ARM PLL at this point, as it controls ALU only and is enabled and configured already by BROM. Proceed to Table 6-6 — «CCGR setting by ROM» (p. 708), and see which clocks are enabled and which are not. Scrolling down to UARTs (p. 710) and see that BROM enables none of UARTs. iMX8MP boards usually use second UART for debug, thus, let's remember that its clock number is 74.

3. Find out what exactly PLLs and clocks we have to configure to start up hardware-blocks.

From the previous paragraph we see that none of UARTs hardware-blocks is enabled by BROM. Well, we have to find out how to enable it by ourselves. Let's start with CCM structure, it is described in section 5.1 — «Clock Control Module (CCM)» (p. 227). Looking at Figure 5-1 — «CCM Block Diagram» (p. 228) we see that clock ticks pass from clock generators (on the left side) via PLLs (or bypassing them), then to CCM's Clock Root Generator which has clock slices, then clock slices form out clock roots and, finally, come out to hardware-blocks (on the right side). Well, this scheme looks more complicated than that we've discussed in the previous post. The idea of Clock Roots becomes more clear if we look at 5.1.2 — «Clock Root Selects» (p. 228). Let's scroll down to Slice Index №95 (p. 241). 95 — is the slice Clock Root of UART2. In the column «Source Select» we see that it can be driven by few outputs. We will drive our UART by SYSTEM_PLL2_DIV5. Let's remember its value — 010b. As we know already, System PLL2 is enabled at 1GHz. Here we need to configure its outputs — ensure its DIV5 (1GHz div 5 is 200MHz — we'll need this value later) output is enabled. This is done by configuring System PLL2 General Function Control Register which is described in section 5.1.8.32 — «SYS PLL2 General Function Control Register» (p. 509). We will set all PLL_DIVx_CLKE bits and PLL_CLKE. The address of this register is ANAMIX base + 104h. After we have enabled PLL outputs we have to select proper clock root for UART2 hardware block. This is done by configuring CCM_TARGET_ROOT №95. It is described in section 5.1.7.10 — «Target Register (CCM_TARGET_ROOTn)» (p. 412). We see that here we need to set enable (28th bit) to 1 and MUX (24-26 bits) to the value we've remembered earlier — 010b. The address of this register is CCM base + 8000h + 95 (slice index we need) * 80h. The resulting value we have to write to the register is 12000000h.

4. Configure UART TX and RX pads.

Well, PLLs and clocks are configured and enabled. Now we have to find out how to configure UART pads. First, let's set proper AF for our UART. Alternative functions are described in table 8.1.1.1 — «Muxing Options» (p. 1287). Let's scroll to UART2 (p. 1307). Here we see that UART2_RX port can be routed to one of these pads: UART2_RXD, SD2_DATA0, SD1_DATA3 and SAI3_TXFS. The first one is what we need. UART2_TX port can be routed to one of these pads: UART2_TXD, SD2_DATA1, SD1_DATA2 and SAI3_TXC. The first one is what we need. Both UART2_TXD and UART2_RXD have mode called ALT0. Let's proceed to section 8.2.4 — «IOMUXC Memory Map/Register Definition» (p. 1344). In this table we need to find our UART2_RXD and UART2_TXD — they are on the bottom of page 1350 and on the top of page 1351 correspondingly. Here we see that their reset values both are 5h (remember that value for a while) and absolute addresses are 30330228h for UART2_RXD, and 3033022Ch for UART2_TXD. Then click on the link in the right column. From section 8.2.4.134 — «SW_MUX_CTL_PAD_UART2_RXD SW MUX Control Register» (p. 1540) and section 8.2.4.135 — «SW_MUX_CTL_PAD_UART2_TXD SW MUX Control Register» (p. 1542) we see that MUX_MODE is represented by lowest 3 bits of this registers. Also, we see that 5h (the value we've remembered recently) corresponds to 101b. That means that both pads we need are routed to pads we don't need — GPIO5 24 and GPIO5 25 in this case. Thus, we have to configure those pads correctly for our needs — set both to zero (ALT0). To achieve that, we need to write zeroes to 30330228h and 3033022Ch to set proper alternative functions for that pads. But that's not all we have to do to make UART pads functioning correctly. In addition to setting AF, we need to configure physical parameters of that pads. This is done by setting two SW_PAD_CTL Registers: UART2 RXD pad control register, section 8.2.4.286 — «SW_PAD_CTL_PAD_UART3_RXD SW PAD Control Register» (p. 1837) and UART2 TXD pad control register, section 8.2.4.287 — «SW_PAD_CTL_PAD_UART2_TXD SW PAD Control Register» (p. 1839). After inspecting these description, we conclude that zero is a good value for both of them. And the last one step we have left to do. UART RX is a little special, because it works as an input function. Thus, we need to select input for it. This is done by configuring DAISY Register, which is represented in section 8.2.4.376 — «UART2_UART_RXD_MUX_SELECT_INPUT DAISY Register» (p. 1922). Here we see that 110b «SELECT_UART2_RXD_ALT0 — Selecting Pad: UART2_RXD for Mode: ALT0» (p. 1923) is what we need. Thus, we'll write 6h to 303305F0h.

5. Configure UART hardware-block.

The last thing we have left is to configure UART hardware-block. It is done by familiar steps like — reading registers values (optional), modifying that values (optional) or forming out them from scratch, writing values to registers, waiting for conditions flags (optional). Actually, UART is a simple hardware-block, thus, I will not explain the specific process of configuring it — you will see it in the code, which will be presented in the next post.

6. Output some string.

After UART is configured and running, the game starts. Now we are prepared and ready to output some strings. This is also done by writing a byte to some address (UART register) and controlling TX empty flag to avoid buffer overrun. Here I'll skip detailed description of this process too — see it in the code.

7. Finish with infinite loop outputting characters received from UART.

We will finish with infinite loop outputting characters received from UART. This is done by controlling RX empty flag and reading received byte (UART register) when flag becomes unset.

8. Make proper boot image and put it in the place BROM of our SoC expects it to be.

To make our application load and run on the SoC we've chosen, we have to prepare proper boot image and put it in the place BROM of our SoC expects it to be. This is done by a tool (mkbb_imx8), which is derived from standard NXPs mkimage_imx8. I will not explain how it works and how it was developed at all, but will show how to use it to generate boot block (and how and where to place it) for our SoC in the next post.

Conclusion

We've made it. Now we have gathered all the information we need to start writing code for iMX8MP SoC and we are ready to proceed. And now you know what lies between the theory and the daily routine of embedded developer. In the next post we will develop the application — we will write in an assembly language and C-code, compile, link objects to binary, make a bootable image of it, and put it in the right place on a storage media. It'll be a very small program that will run on the iMX8MP and on this SoC only. But it will give a platform for learning ARM64 machine. You'll be able to play with the ARMv8 machine from the ground, as in assembly language as in C-code — start (kick) or not start its cores, switch or not switch exception levels, output values of registers, and so on. Finally, we'll have a wide-open window to ARM64 machine!

DaftSoft

16/01/2025

The real «Hello World» from embedder (2): practice, prepare