17/12/2024

The real «Hello World» from embedder (1): preface, theory

Preface

Any learning process eventually comes up to some examples, some practice. In the case of software development, practice starts from so-called «Hello World». «Hello World» is the very first code every software developer on every platform, API, or framework should do on his very first days or steps. Embedded software engineers are not an exception.

There is a lot of «Hello Worlds» out there on the Internet and in books. Those «Hello Worlds» are in any kind of programmings languages, APIs, and frameworks. But the vast majority of those «any kinds» are in pure software — Java, JS, C++, etc. Thus, embedded software developers, those of them who want to study machine architecture, get in touch and play with it, suffer from a lack of examples; thus, they have no starting point.

You can argue with that — there are examples in assembly language for all types of architectures, assembly language should let us study machine architecture and its behaviour. Yes, there is a lot of examples in assembly language, but those examples are for very high (let's say — top) levels of runtime environment — Linux and Mac user-space. This approach to learning machine architecture doesn't lead to understanding of how it works and how it is organised. 

Actually, it gives us a very narrow slit to machine architecture. Even within kernel-space, we are very limited in abilities to learn the machine. That's because all the hardware initialisation is already done for (and before) us by bootloader and kernel itself. The second problem is so-called «conceptions of operating systems». Those «conceptions» are hiding a lot of machine architecture from us and arise a lot of software. Thus, working even in kernel-space, we learn more of those «conceptions» and principles rather than a machine.

Let's go on and discuss our last chance — MCUs, which are cheap, popular, easy to start up and do let us be as close to the machine as possible. MCUs always had a crucial difference with CPUs and SoCs — they have no MMU and usually are single-core ALUs. After the beginning of the ARM64-era — the times when not only phones and tablets but also desktops (and even servers) and laptops are running on ARM64, MCUs eventually have got another one drawback ARM32 (the most common architecture of worth anything at all MCUs) architecture became outdated even as a single-core platform for studying. Now we are living in the days when it's almost useless to learn ARM32 architecture. It's more a waste of time because the difference between actual and almost omnipresent ARM64 and rapidly becoming obsolete ARM32 is enormous.

Thus, we come to that point where we need a comfortable, big enough window (let's say — better a door or a nice gate) to the machine that represents modern ARM64 architecture. Usually it is done by programming on an emulator. Developing for an emulator is like a game because of its abstraction. Games (and gaming) give you minimal or even zero risks by the price of insignificant wins. To get real valuable practice and results, we need a real «Hello World» on real hardware.

Learning a machine architecture (and its behaviour) is done by so-called «Bare Metal» programming. «Bare» means that you are working on a machine without an OS, without a kernel, and even without a bootloader. Actually, you develop code that works instead of a bootloader or runs where and when the very first part of user bootloader (SPL — Second Platform Loader) usually does. «Metal» stands for hardware or machine and means that you work on a real hardware — not a virtual machine or an emulator.

In these topics we'll cut a good window for studying modern ARM64 machine. This will be the «Hello World» from embedded engineer and a nice toy for embedded developer.

Theory

Let's start with summarising of what we already know about a computing machine, specifically about its heart — CPU or SoC. We know that SoC provides a set of functions (functionalities); each functionality is provided by an according and separate hardware-block. Thus, SoC consists of hardware-blocks or represents a set of hardware-blocks. Each hardware-block is driven by a clock; each clock is derived from PLL; each PLL is driven by an oscillator. So, turning a hardware-block on, besides powering it, is just enabling the clock it is connected to. Actually, you will not find such element like «clock» on your board. «Clock», as an element — is just a conception of embedded developers. Really, «clock» — is one of the outputs of PLL, which can be enabled or disabled by software. Real clocking sources are oscillators and PLLs.

But enabling clock is not enough for starting using functionality of that hardware-block. In most cases there are two more things. First. Usually, SoCs provide more functionalities than they have pins, or, in terms of embedded software, more often we say — pads. It leads to so-called «alternative function» (AF) conception. This means that some pads, as we say in software theory — can be configured to specific (alternative) function — I2C SDA or UART TX, for example. Configuring of AFs is done by Input/Output Multiplexing Controller (IOMUXC). IOMUXC is a built-in hardware-block. So, it has to be turned on (clocked) too, like other hardware-blocks. From the hardware point of view, IOMUXC simply routes signals from SoC's balls to specified hardware-block inputs/outputs — to I2C bus controller or to UART controller in example above. In other words, IOMUXC switches signals between hardware-blocks on one end and SoCs ball on the other. Second thing is configuration of hardware-block. This is done by code. After hardware-block is clocked (enabled) it stays in some default condition. We need to write specific values to specific registers to make it function according to our needs.

Clocks for hardware-blocks and corresponding PLLs are enabled by code. Hardware-blocks are configured by code too. But no program can run on a non-clocked CPU. So who and how starts the main PLLs and clocks that drive the ALU? This functionality is hardcoded into so-called BootROM, or BROM. BROM is microcode of SoC itself; it is located inside SoC. BROM initialises the essential hardware minimum required to start user machine code. It starts up a minimal amount of hardware-blocks and after that performs as FPL (see below). BROMs are very different in functionality — some of them just start code from built-in memory, not even loading it into RAM (like BROMs of MCUs do), some of them initialise a bunch of hardware-blocks, DDRC, and can even operate filesystems (like Raspberry's Broadcom does). The most common condition BROM leaves SoCs in is: one ALU core is started, SRAM (or OCRAM — OnChip RAM) initialised, and one of the boot sources initialised. Initialisation of the rest functionality is left for user code.

Any SoC uses its own, proprietary FPL (First Platform Loader). FPL is a part of BROM we've discussed earlier. To run our code on a SoC we need so-called «boot image», which is made out of our code. Any FPL expects it in a specific format and in a specific place. Thus, after we've compiled, linked, and stripped our source code into pure machine code, we have to pack it and make it comply with specific requirements of a particular SoC and its BROM. This is done by a tool, which usually is named «mkimage», sometimes something like «mkimage_XXX», where XXX is replaced by the name of SoC or the name of a family of SoCs. Usually, mkimage adds some headers, which BROM reads, and sometimes CRCs, to machine code. BROM uses this information to verify the boot image. Then we have to put this image in a particular place — usually on a SD-Card or eMMC with some offset from zero. Offset is needed to make bootable media also usable as storage media — it leaves space for MBR and filesystems table.

This is what we have about hardware. Let's proceed to software. There's not too much to do and discuss here. We can work in assembly language forever. This is interesting and may be very useful. But anyway, at some point we want to dive out into C-code. Machine does not need conceptions that humans use. But writing in C requires one, it's called — «stack». We have to initialise it — that's all we need to know at the moment. Initialisation of the stack is done by setting the stack pointer register. Stack grows down, so we have to choose an address for it according to its behaviour — to prevent it from touching the bottom of RAM and from destroying our application. And, of course, we can't set it higher than the top of accessible RAM.

Now let's proceed to our specific task. Let's say we want to make our SBC output a «Hello World». How can we do that? Let's confess that outputting (drawing) strings on a display is a little complicated task for a Bare-Metal beginner. So we'll output our «Hello World» via the most common debug interface all over the world — UART. UART functionality is provided by a particular hardware-block. So we need to start clocking that hardware-block, set its pads and configure it.

First, to enable a clock for UART we have to know which clock exactly we need to enable on the exact SoC. As we mentioned earlier, every clock is derived from a particular PLL. Thus, we need to find out what PLL provides the clock we need. All this information is presented in a datasheet and is rigidly tied to a certain SoC. The second part is to set AF — to configure pads for UART — its TX and RX. Here we need to turn on IOMUXC (PLL, clock) and configure pads we need to functions we need. And the last part — configuration is done by writing values to a memory location mapped to an address of hardware-block — IUMOUXC (pads) and UART (baud rate, parity, etc.) in our case. What configuration must be written is described in a datasheet and is rigidly tied to a certain SoC.

Theory ends at this point. In the next part, we'll choose specific SoC, gather information needed to design the «Hello World» for the chosen SoC. From this information we will form out the plan of jobs for third part.

No comments:

Post a Comment