DaftSoft: C++

Showing posts with label C++. Show all posts

12/02/2025

The real «Hello World» from embedder (3): result, play — Barium No-Boot!

Finally...

Well, we've discussed goals and benefits of bare-metal development, have gathered all the information we need to work on a specific SoC, and are ready to start writing code. In this post, we will practice. We will go thru the whole process of bare-metal coding, compiling, linking and, finally, will get a real stand-alone application for a real specific ARM64/ARMv8-A SoC. Can you imagine — it will be less than 1kB in size (including all necessary headers)! But this tiny application as a result, will perform the minimal task — output strings via UART. And even more — it will read from UART and output those bytes! Also we will add some functionality to our application, do some interesting stuff and perform some experiments.

The plan

The plan we need to carry out to reach the final result looks like:
1. Organise our code — .s and .c files.
2. CPU start code — assembly code.
3. Initialise the stack pointer — assembly code.
4. Initialise PLLs, clocks, pads, UART hardware-block — C code.
5. Write some payload code — write to and read from UART.
6. Compile and link our code into binary file.
7. Make our binary acceptable by our SoC as boot-code.
8. Put our boot-block in a proper place.
9. Configure the board for booting from media we need.
10. Power on and play!

Hands-on code

Let's get it started.

Organise our code — .s and .c files.

This is bare-metal, we don't have bootloader and kernel behind us. Thus, there are some routines we have to do in assembly language prior any C-code. So, we will have assembly files. Actually two. They can be combined into a single .s-file, but we will stick to traditions. These files are (common names on 64-bit platforms) — start_64.s and crt0_64.s. But what exactly should be done in assembly files (language) on bare-metal? The answer is — the things we can't do in C. We can access registers of hardware-blocks via volatile pointers as this is done via so-called memory-mapped input/output. But we can't access ALU registers in C directly, thus start_64.s file contains the very first code that cannot be done in C. Specifically CPU start-up code — applying erratas, switching modes, exception levels, setting interrupt vector tables, etc. Now let's go on with crt0. You probably are familiar with it (or, at least, heard of it) — so called «C-runtime zero». In user-space, usually, this is some object file, which is implicitly added to our code by linker prior to file containing main() function written in any high-level language. In user-space crt0 prepares argv and argc which are passed to developer's main() function, makes pointers to environment variables code starts with, etc. As you can assume, this file is unnecessary if we are about to write in assembly language only. Its content is to prepare the environment (be it a user-space program or bare-metal code) for C-code, in our case — set stack pointer to some valid address. After all necessary assembly code routines are done, we can break out to C-code. Today everything we've discussed in previous post will be done in C-code. Assembly language files are for startup code and stay as the window to ARMv8 architecture, our sandbox — one of our goals.

CPU start code — assembly code.

CPU start code. In this example we don't need to do anything here, but let's save some values for further research — (believe me) it'll be interesting to explore. According to ABI, x0 will be the first parameter of called function, x1 — the second, x2 — the third. Let's save initial program counter, stack pointer and exception level while they are intact — at the very beginning of our code. We will save initial PC, SP and EL by storing them in x0, x1 and x2 registers. Later you will see how they will get into our main C-function and will be output to UART. So, start_64.s will look like:


.arch armv8-a

.globl _start
_start:

	# Store the address of the very first instruction. This will be the
	# address BROM puts our code to. According to ABI x0 will be the
	# first parameter of called function. Save current PC to x0:
	adr x0, .

	# Store initial stack pointer address as the second parameter:
	mov x1, sp

	# Save Current EL to x2, and it will get to third parameter:
	mrs x2, CurrentEL

	# Branch to crt0's main:
	b _main

Initialise the stack pointer — assembly code.

In our case, there is the only thing we have to do in crt0 to get prepared for C-code — set stack pointer. Stack pointer is set by just writing memory address value to SP register. We want our x0, x1 and x2 registers to pass thru this file to C-function, so we avoid using these registers here. So, crt0_64.s will look like:


.arch armv8-a

.globl _main
_main:
	# Set stack pointer at the top of OCRAM 97FFFFh (p.706):
	# Move it to x3 without last Fh (aligned to 16):
	mov x3, 0xFFF0
	movk x3, 0x0097, LSL #16

	# Set stack pointer:
	mov sp, x3

	# Finally, break out to C-code:
	b barium_main

Initialise PLLs, clocks, pads, UART hardware-block — C code.

After that, we have to initialise PLLs, clocks, pads and UART controller. I suppose the way we should do this on our SoC is explained well enough in previous post, so I will not spend time on describing it further here. You will see it in the code.

Write some payload code — write to and read from UART.

Payload code. Same as p.4 — since we have initialised all the hardware and have some high-level functions, the code becomes self-explanatory. The only thing I want to explain is the uart_chars_counter and UART_FIFO_MAX. On each output we increment special counter and when it reaches maximum length of FIFO we wait for real transfer to finish by waiting for TXDC bit in UART Status register.

Compile and link our code into binary file.

Compile and link. Compilation and even cross-compilation is not something new — we compile separate source files, link them in a single ELF-file then dump pure code of it into binary file. You'll see a linker script in this repository. Actually, you can build the project without it. I've used it just to remove some unneeded section that GCC adds to binary. But we should bear in mind that this is bare-metal. That means that machine starts from the very first instruction it finds in our code. Actually, BootROM branches to the address it loads our application to. Thus, we have to put our start function in the very first place. This is done by respecting the order of object files when linking to resulting code. Thereby, we set the file containing our first code in a first place and the function we want SoC to start as the first function in that file.

Make our binary acceptable by our SoC as boot-code.

To make binary acceptable by a SoC is very important step, but I'll not bother you with all that details. I've made host tool named mkbb_imx8. It is derived from NXP's mkimage_imx8. It gathers some information, fills necessary structures and writes them to proprietary header adding it to specified binary. The only interesting thing here is the address to load image to. This filed tells BootROM where to put loaded application (and jump to). In my tool it is defined as DEFAULT_LOAD_ADDR in source code or can be passed as third, optional parameter. Command line parameter takes priority over define. If you ever need to develop such tool for particular SoC, you can follow the same way as I have made. No tricks here, all you need — just to find the structure of proprietary header for specific SoC. Usually this information can be found in vendor's BSP build system — host tool, often named mkimage_XXX. You will see mkbb_imx8 code in separate directory in the repository.

Put our boot-block in a proper place.

The result of mkbb_imx8 is a binary file which will be accepted by iMX8MP BootROM. The last thing we have to do on our host machine is to put our boot-block in a proper place. iMX8MP expects it to be on media with offset 32kB. Thus, we will write our application to SD-card with this command:


  dd if=./barium of=/dev/sdX bs=1k seek=32 ; sync

I said it'll be less than 1kB in size. The code (with its data) is 954 bytes, and it appeared to be 1018 bytes with iMX headers. Maybe we've made one of the most beautiful kilobytes all over the world.

Configure the board for booting from media we need.

If you are reading this, perhaps, you have some board (proprietary or kit) with iMX8MP SoC. If so, you may already have it configured to boot from SD-card or already know how to do it. If not, you need to configure the board for booting from SD-card. From iMX8MP SoC perspective boot device selection is described in section 6.1.5 — «Boot devices (internal boot)» (p. 713). Refer to your product manual to see how it is done on a particular board.

Power on and play!

Insert SD-card into your board, connect your UART converter to UART2 interface of the board, start your favorite serial communication program and power on (or reset) the board. You'll see:


  Barium No-Boot V0.1 (iMX8MP)
  Build: 18:00:00, Feb 12 2025
  Initial PC: 0000000000920000
  BootROM SP: 0000000000916ED0
  Current EL: 0000000000000003

What's about naming? «Barium» stands for «Bare», because barium, besides consonance with «bare», is a metal element. Thus bare-metal is replaced with «Barium». «No-Boot» means it does not boot any kernel or anything at all.

Well, what we see here. First, initial PC is 920000h. This is our DEFAULT_LOAD_ADDR (in my Makefile address is passed via command line argument). You can play with it, set it here and there according to OCRAM memory map (p. 707). Datasheet claims that OCRAM free area starts from 918000h, but the lowest address I could put application to is — 920000h. BootROM does not allow to put it to 918000h.

But what is more interesting — stack pointer BootROM leaves for us. It does not allow to place our application in its reserved area, but leaves SP at 916ED0h. This can be used — we can leave SP intact and have stack from 916ED0h to 900000h which is pretty enough space and is actually restricted for our code by BootROM — unusable anyway.

And we see that we run in third exception level, EL3 — the highest one. We have carte blanche to do whatever we want on this machine.

What conclusion can be drawn from this? First Platform Loader (BootROM) of iMX8MP leaves SoC in condition where we can omit any initialization which is usually done in assembly language — as we mentioned earlier, stack pointer is already set to some valid address. While setting stack pointer is the only crucial thing we have to do in assembly code for such simple example, we can skip this step by excluding all .s-files. Nothing else should be changed, just don't forget to set main .c-file (in our case — barium.c) as the first one in your objects list. And main function (in our case — barium_main()) should be the first function in this file. Keep in mind — this is bare-metal, and the first function in your code is the entry-point regardless what ENTRY() linker directive points to. (You won't even see it in my .lds-file). Despite our goal was exact opposite, the whole bare-metal project can work without a single .s-file on iMX8MP! You can try it. Thus, we can call iMX8MP very high level SoC, or some kind of a «C-SoC».

You can clone the final repository from Barium No-Boot (iMX8MP) (see «Stage I» directory).

That's all. Finally we have a good window to learn ARMv8 aka ARM64 aka Aarch64 on a real hardware — we have assembly files and C-files to play with, we have build system for this project — Makefile, linker script, we have host tool to make proper boot image, and we have some kind of a debug interface — UART. That looks like sufficient setup for further studying.

17/12/2024

The real «Hello World» from embedder (1): preface, theory

Preface

Any learning process eventually comes up to some examples, some practice. In the case of software development, practice starts from so-called «Hello World». «Hello World» is the very first code every software developer on every platform, API, or framework should do on his very first days or steps. Embedded software engineers are not an exception.

There is a lot of «Hello Worlds» out there on the Internet and in books. Those «Hello Worlds» are in any kind of programmings languages, APIs, and frameworks. But the vast majority of those «any kinds» are in pure software — Java, JS, C++, etc. Thus, embedded software developers, those of them who want to study machine architecture, get in touch and play with it, suffer from a lack of examples — they have no starting point.

You can argue with that — there are examples in assembly language for all types of architectures, assembly language should let us study machine architecture and its behaviour. Yes, there is a lot of examples in assembly language, but those examples are for very high (let's say — top) levels of runtime environment — Linux and Mac user-space. This approach to learning machine architecture doesn't lead to understanding of how it works and how it is organised.

Actually, it gives us a very narrow slit to machine architecture. Even in kernel-space, we are very limited in abilities to learn the machine. That's because all the hardware initialisation is already done for (and before) us by bootloader and kernel itself. The second problem is so-called «conceptions and abstractions of operating systems». Those «abstractions» hide a lot of machine architecture from us while «conceptions» arise a lot of software that serve an OS itself and make it what it is from user's point of view. Thus, working even in kernel-space, we learn more of those conceptions and principles rather than a machine.

Let's go on and discuss our last chance — MCUs (Micro-Controller Unit), which are cheap, popular, easy to start up and do let us be as close to the machine as possible. MCUs always had a crucial difference with CPUs and SoCs (System on Chip) — they have no MMU and usually are single-core ALUs. After the beginning of the ARM64-era — the times when not only phones and tablets but also desktops (and even servers) and laptops are running on ARM64, MCUs eventually have got another one drawback — ARM32 (the most common architecture of worth anything at all MCUs) architecture became outdated even as a single-core platform for studying. Now we are living in the days when it's almost useless to learn ARM32 architecture. It's more a waste of time because the difference between actual and almost omnipresent ARM64 and rapidly becoming obsolete ARM32 is enormous.

Thus, we come to that point where we need a comfortable, big enough window (let's say — better a door or a nice gate) to the machine that represents modern ARM64 architecture. Usually it is done by programming on an emulator. Developing for an emulator is like a game because of its abstraction. Games (and gaming) give you minimal or even zero risks by the price of insignificant wins. To get real valuable practice and results, we need a real «Hello World» on real hardware.

Learning a machine architecture (and its behaviour) is done by so-called «Bare Metal» programming. «Bare» means that you are working on a machine without an OS, without a kernel, and even without a bootloader. Actually, you develop code that works instead of a bootloader or runs where and when the very first part of user bootloader (SPL — Second Platform Loader) usually does. «Metal» stands for hardware or machine and means that you work on a real hardware — not a virtual machine or an emulator.

In these topics we'll cut a good window for studying modern ARM64 machine. This will be the «Hello World» from embedded engineer and a nice toy for embedded developer.

Theory

Let's start with a little overview of a computing machine, specifically about its heart — CPU or SoC. SoC provides a set of functions (functionalities); each functionality is provided by an according and separate hardware-block. Thus, SoC consists of hardware-blocks or represents a set of hardware-blocks. Each hardware-block is driven by a clock; each clock is derived from PLL, Phase Lock Loop — a small circuit that generates one or more frequencies from an input clock source. Each PLL is driven by an oscillator. So, turning a hardware-block on, besides powering it, is just enabling the clock it is connected to. Actually, you will not find such element like «clock» on your board. «Clock», as an element — is just a conception of embedded developers. Really, «clock» — is one of the outputs of PLL, which can be enabled or disabled by software. Real clocking sources are oscillators and PLLs. You can see clock generating scheme below.

Clock generating scheme

But enabling clock is not enough to start using functionality of hardware-block. In most cases there are two more things. The first — usually, SoCs provide more functionalities than they have pins, or, in terms of embedded software, more often we say — pads. It leads to so-called «alternative function» (AF) conception. This means that some pads, as we say in software theory — can be configured to specific (alternative) function — I2C SDA or UART TX, for example. Configuring of AFs is done by Input/Output Multiplexing Controller (IOMUXC) or, more commonly called Pin Multiplexer (PINMUX). PINMUX is a built-in hardware-block. So, it has to be turned on (clocked) too, like other hardware-blocks. From the hardware point of view, PINMUX simply routes signals from SoC's pads to specified hardware-block inputs/outputs — to I2C bus controller or to UART controller in example above. In other words, PINMUX switches signals between hardware-blocks inside of a SoC on one end and SoCs ball (or pads) that we can see on the chip package. You can see alternative functions multiplexing scheme below.

Alternative functions multiplexing scheme

Second thing is configuration of hardware-block. This is done by code. After hardware-block is clocked (enabled) it stays in some default condition. We need to write specific values to specific registers to make it function according to our needs.

Clocks for hardware-blocks and corresponding PLLs are enabled by code. Hardware-blocks are configured by code too. But no program can run on a non-clocked CPU. So who and how starts the main PLLs and clocks that drive the ALU? This functionality is hardcoded into so-called BootROM, or BROM. BROM is microcode of SoC itself; it is located inside SoC. BROM initialises the essential hardware minimum required to start user machine code. It starts up a minimal amount of hardware-blocks and after that performs as FPL (see below). BROMs are very different in functionality — some of them just start code from built-in memory, not even loading it into RAM (like BROMs of MCUs do), some of them initialise a bunch of hardware-blocks and can even operate filesystems (like BROM of Raspberry's Broadcom does). The most common condition BROM leaves SoCs in is: one ALU core is started, SRAM (or OCRAM — OnChip RAM) initialised, and one of the boot sources initialised. Initialisation of the rest functionality is left for user code.

Any SoC uses its own, proprietary FPL (First Platform Loader). FPL is a part of BROM we've discussed earlier. To run our code on a SoC we need so-called «boot image», which is made out of our code. We need to say few words about contents of boot image. It is not a regular file we get from compiler. And the architecture mismatch is not the only reason. Usually compiler builds ELF (Executable and Linkable Format) file. ELF-file contains a lot of extra information which is used by OS or some other environment. Also ELF-file can include debug information, symbol names, etc. But while working on bare-metal we have to get rid of all OS-specific, debug and other environmental information. This is because SoC will execute raw code only and treat any binary data as straight, linear stream of code (with some addition of data). Bare-metal needs raw code to function correctly. If we try to load whole ELF-file as a bare-metal application, most probably we'll get some kind of illegal instruction exception because that extra information in ELF-file will not match with correct instructions codes. The process of making raw code, or raw binary from ELF-file is called stripping. Now, let's proceed with FPL. Any FPL expects boot-image in a specific format and in a specific place. Thus, after we've compiled our source code to (set of) object files, linked that objects into single binary, and stripped it down to pure machine code, we have to pack it and make it comply with specific requirements of a particular SoC and its BROM. This is done by a tool, which usually is named «mkimage», sometimes something like «mkimage_XXX», where XXX is replaced by the name of SoC or the name of a family of SoCs. Usually, mkimage adds some headers, which BROM reads, and sometimes CRCs, to machine code. BROM uses this information to verify the boot image. Then we have to put this image in a particular place — usually on a SD-Card or eMMC with some offset from zero. Offset is needed to make bootable media also usable as storage media — it leaves space for partition table and filesystems.

This is what we have about hardware. Let's proceed to software. There's not too much to do and discuss here. We can work in assembly language forever — this is interesting and may be useful. By limiting ourself to assembly language only we can omit using stack. But anyway, at some point we want to dive out into C-code. At this point we'll need stack because C-compiler will use it intensively. Thus, we have to initialise it — that's all we need to know at the moment. Initialisation of the stack is done by setting the stack pointer register to a value representing a valid memory address. Stack (most often) grows down, so we have to choose an address for it according to its behaviour — to prevent it from touching the bottom of RAM and from destroying our application in case it is set above. And, of course, we can't set it higher than the top of accessible RAM.

Now let's proceed to our specific task. Let's say we want to make our SBC (Single-Board Computer) output a «Hello World». How can we do that? Let's confess that outputting (drawing) strings on a display is a little complicated task for a bare-metal beginner. So we'll output our «Hello World» via the most common debug interface all over the world — UART. UART functionality is provided by a particular hardware-block. So we need to start clocking that hardware-block, set its pads and configure it.

First, to enable a clock for UART we have to know which clock exactly we need to enable on the exact SoC. As we mentioned earlier, every clock is derived from a particular PLL. Thus, we need to find out what PLL provides the clock we need. All this information is presented in a datasheet and is rigidly tied to a certain SoC. The second part is to set AF — to configure pads for UART — its TX and RX. Here we need to turn on PINMUX (PLL, clock) and configure pads we need to functions we need. And the last part — configuration is done by writing values to a memory location mapped to an address of hardware-block — PINMUX (pads) and UART (baud rate, parity, etc.) in our case. What configuration must be written is described in a datasheet and, again, is rigidly tied to a certain SoC.

After writing this code we have to compile, link and strip the resulting file to raw machine code, make boot-image by forming out BROM header and adding it to machine code we've got on the previous stage. Then put our boot-image to specific place on a specific media and power on SBC.

Theory ends at this point. In the next part, we'll choose specific SoC, gather information needed to design the «Hello World» for the chosen SoC. With this information we will form out the plan of steps for third part.

01/06/2020

Маленькая ассемблерная вставка в мой блог (1): оптимизация выставления переменных состояния

Наблюдая за алгоритмами, по которым работают механические устройства, я начал задумываться — а что, если логика программы сама по себе избыточна? Например, принтер всегда делает какие-то движения — ёрзает кареткой туда-сюда, крутит барабан взад-вперёд и, в итоге, приводит себя в положение, необходимое для работы, игнорируя своё изначальное состояние и, что самое главное — не завися от этого состояния. Это конечно заставляет больше ждать пользователя, но удешевляет устройство так как снижает количество элементов, обеспечивающих обратную связь (или вовсе исключает их) — датчики позиций, счётчики оборотов и положений механизмов. А так же, я полагаю, это упрощает алгоритм и повышает его надёжность — в более высокой степени гарантирует приведение системы к требуемому состоянию за счёт исключения сложных логических операций (вычисление состояния/положения каретки, вычисление расстояния, которое, ей надо пройти). Я обратил внимание на то, что можно делать сначала какие-то безусловные действия (настройки) и лишь затем, модифицировать состояние системы, для приведения её в требуемое и отличное от изначального состояние по условиям. То есть можно снизить количество логики, повысив количество безусловных действий. Но кто-то, возможно, скажет что это приведёт к избыточным действиям (тем более, что я только что описал подобные, избыточные действия механической системы). Это нужно проработать и проверить.
Перейдём в плоскость ИТ/ВТ и конкретно разработки ПО. Представим себе пример — нам нужно выставить какой-то флаг в зависимости от условия — флаг наличия или количества байт в буфере I2C, одно из состояний элемента GUI. Часто люди думают (и пишут) конструкции типа:

if (condition) then
    flag = value_1;
else
    flag = value_2;

или даже:

if (condition) then
    flag = value_1;
if (!condition) then
    flag = value_2;

Я всегда хотел проверить, будет ли вариант:

flag = value_2;
if (condition) then
    flag = value_1;

эффективнее для вычислительной машины. Я предполагал, что такая конструкция должна компилироваться в код, где меньше команд JMP (и сходных — JG, JE, JNE). Чтобы проверить это я написал три варианта решения этой задачи и назвал их «if-then-else», «ternary» и «if-then». if-then-else — самый понятный и, как мне кажется, первый приходящий в голову вариант. С ternary всё понятно — это выставка флага по тернарному оператору. if-then — я назвал вариант, придуманный мной (скорее — который я хочу проверить), где сначала выставляется флаг, затем происходит проверка условия, необходимого для изменения состояния флага и изменение флага если условие истинно, соответственно. Вот эти блоки кода (оформлены в отдельные программы):

// if-then-else:
int main ()
{
    int n = 1;
    if (n < 0xC0FFEE)
        n = 2;
    else
        n = 3;
    return 4;
}

// ternary:
int main ()
{
    int n = 1;
    n = (n < 0xC0FFEE) ? 2 : 3;
    return 4;
}

// if-then:
int main ()
{
    int n = 1;
    n = 2;
    if (n < 0xC0FFEE)
        n = 3;
    return 4;
}

В этих программах я использовал числа 1, 2, 3 и возвращаемое значение 4, чтобы по ним, в последствии, ориентироваться в ассемблерном коде.
Теперь посмотрим как выглядит этот код на ассемблере:


1 .file "if-then-else.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 .LFB0:
7 .cfi_startproc
8 movl $1, -4(%rsp)
9 cmpl $12648429, -4(%rsp)
10 jg .L2
11 movl $2, -4(%rsp)
12 jmp .L3
13 .L2:
14 movl $3, -4(%rsp)
15 .L3:
16 movl $4, %eax
17 ret
18 .cfi_endproc
19 .LFE0:
20 .size main, .-main
21 .ident "GCC: (GNU) 9.3.0"
22 .section .note.GNU-stack,"",@progbits


1 .file "ternary.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 .LFB0:
7 .cfi_startproc
8 movl $1, -4(%rsp)
9 cmpl $12648429, -4(%rsp)
10 jg .L2
11 movl $2, %eax
12 jmp .L3
13 .L2:
14 movl $3, %eax
15 .L3:
16 movl %eax, -4(%rsp)
17 movl $4, %eax
18 ret
19 .cfi_endproc
20 .LFE0:
21 .size main, .-main
22 .ident "GCC: (GNU) 9.3.0"
23 .section .note.GNU-stack,"",@progbits


1 .file "if-then.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 .LFB0:
7 .cfi_startproc
8 movl $1, -4(%rsp)
9 movl $2, -4(%rsp)
10 cmpl $12648429, -4(%rsp)
11 jg .L2
12 movl $3, -4(%rsp)
13 .L2:
14 movl $4, %eax
15 ret
16 .cfi_endproc
17 .LFE0:
18 .size main, .-main
19 .ident "GCC: (GNU) 9.3.0"
20 .section .note.GNU-stack,"",@progbits

Части кода, что одинаковы, затенены серым цветом, а те, что нас нтересуют — выделены в центре. Таким образом мы фокусируемся только на том, что нас интересует, отсекая общие для всех программ участки кода.
Что мы видим:

        movl $1, -4(%rsp)

Здесь мы иницилизируем переменную единицей (именно для того чтобы понять, где начинается интересующий нас код, я это и делаю).

        movl $4, %eax
        ret

Это наш return 4; (именно для того чтобы понять, где заканчивается интересующий нас код, я и возвращаю четвёрку).
Можно видеть, что всё происходящее вполне прозрачно и подтверждает теорию, выдвинутую мною. Теперь мы можем отранжировать методы.
if-then показал себя самым компактным, не только с точки зрения операций, но ещё и тем, что содержит на одну метку меньше, а метки так же хранятся в ELF-файле, что, в нашем случае говорит об экономии места. Стоит отметить, что место экономится гораздо меньше, чем скорость — метка в файл записывается один раз, грузится тоже один раз, а вот исполняться каждый такой участок кода в программе может неисчислимое количество раз.
На втором месте if-then-else. Вполне логично больше на одну метку и на одну операцию перехода.
А вот ternary лично меня удивил. Это самый худший вариант, как с точки зрения размера, так и с точки зрения скорости (количества операций). В этой ситуации GCC сгенерировал код, который, кроме всех операций, что выполняет в варианте if-then-else, зачем-то работает сначала через регистр EAX, а затем перекладывает значение в переменную.
Вывод: да, вариант if-then эффективнее с точки зрения выполнения на машине. Хотя такая конструкция может несколько хуже восприниматься человеком — «Как это, сначала что-то приравнивается, а потом проверяется условие, и если оно не выполняется, то флаг вообще остаётся без внимания» — как будто кто-то забыл дописать строчку. Но если люди часто используют тернарную форму записи, которая вообще «не для людей» и считают себя «сеньорами», то я думаю мы можем писать в стиле if-then и тоже вполне обосновано считать себя сеньорами.
Здесь добавлю, что мы можем пользоваться этим методом не только для выставления флагов типа 1/0 или типа Boolean — метод if-then можно использовать и для выставления иных изначальных значений — цвет кнопки зелёный, а если имеется состояние ошибки, то выставлять его в красный; количество байт в буфере — ноль, а если есть что-то на входе, то выставляем этот счётчик в нужное количество и т.д..
Так же методом if-then можно экономить (хоть и не так много в процентном соотношении) и на более сложных конструкциях ветвления: цвет кнопки зелёный, если состояние ошибки — выставить в красный, если состояние промежуточное — выставить в оранжевый. Это сэкономит не 50%, как в вышеописанных ситуациях, а количество состояний за вычетом одного перехода и одной метки. Используя метод if-then, стоит вначале выставлять переменную в максимально часто возникающее состояние (если это можно продумать) — в таком случае часто будет происходить одна операция типа MOV, за ней одна или более операций типа JMP и минимальное количество случаев с повторным MOV, так как вы уже выставили флаг в максимально часто возникающее состояние.
В следующих статьях можно будет рассмотреть что получается из иных стандартных языковых конструкциях и как можно на них срезать углы.

07/04/2020

Размышления (1): о C, C++, связи между ними, почему ООП переоценен и последствия этого

Если вы являетесь фанатом какого-либо из указанных в заголовке языков, лучше пропустите эту статью, потому что здесь я буду критически и аргументировано высказываться (в основном в сторону ООП, C++ и Страуструпа).

Для начала предлагаю рассмотреть, чем отличаются подходы процедурного программирования и объектно-ориентированного. В парадигме процедурного программирования на первое место ставится функционал, а в объектно-ориентированном — информация, свойства описываемого объекта из предметной области. Информация (описания объектов, параметры, переменные) в процедурном языке вторичны и описываются отдельно от функций (как правило, в виде т.н. «структур») и передаются в функции параметрами. В объектах, наоборот — информация (свойства) объекта объединяются в описании класса, а его функционал описывается и реализуется внутри класса. Поэтому в процедурных языках основной единицей является модуль с реализованным функционалом, который, в свою очередь, состоит из процедур. В объектно-ориентированных языках основной единицей является класс, который состоит из его свойств и, включённых в него функций. То есть, в процедурном языке функция работает с переданными ей данными, в объектно-ориентированном — класс содержит в себе все свои свойства и встроенные функции, постоянно имеющие доступ к этим свойствам.
На самом деле всё ООП построено на процедурах, которым, при вызове, неявно передаётся ссылка на структуру класса. Иной реализации быть не может, поэтому объектно-ориентированные языки являются лишь «парадигмой», идеей или подходом, а не технологией (в смысле системном).

Теперь рассмотрим области применения обоих подходов. Изначально разработка всего ПО была системного уровня — управление устройством приёма перфокарт, жёстким диском, простейшим принтером, примитивные базы данных, несложные файловые системы. Всё это писалось на ассемблере — императивном языке. Поэтому всё началось именно с парадигмы программирования, максимально приближённого к технике. Затем языки программирования развились до таких как C, Pascal, BASIC — процедурные языки. Количество обрабатываемых одной программой объектов схожих типов возрастало. Появлялись стандарты интерфейсов — для жёстких дисков, принтеров, дисководов. Да и самих устройств одного типа к компьютеру стали подключать больше. (Обратите внимание на то, что в описании выше я умышленно перечислил все объекты, с которыми работали разработчики в единственном числе, а здесь — во множественном). Появилась многозадачность, и что не маловажно — начали появляться графические интерфейсы — сущности, состоящие из множества объектов с повторяющимися характеристиками (свойствами) со схожим поведением и с некоторыми разнящимися свойствами (например, название и размеры кнопки или окна).
На самом деле интерфейсы были не только графическими, например, был Borland TurboVision — библиотека для создания программ с пользовательским интерфейсом на псевдографике (TUI — Text User Interface). Реализована эта библиотека была на Borland Pascal (Pascal c ООП) и Borland C++.

С конца семидесятых и все восьмидесятые года Бьёрн Страуструп трудился над своим детищем — языком ООП (в том числе и) высокого уровня, в последствии получившим название C++. Долгое время язык назывался «C with Classes» («C с классами») и автор не предполагал выпускать его как публичный продукт. Когда язык C++ начал окончательно формироваться, Страуструп принял решение не отходить от Plain C, и построил C++ на основе последней.
Некоторое время меня раздражало, что человек сделал язык, значительно отличающийся от одного из уважаемых мной, схожим с ним и даже назвал его похоже («++» означает «шаг вперёд по сравнению с C»). Что, на мой взгляд вносило путаницу (которая, кстати, никуда и не делась). Я даже думал (по аналогии с тем, как, по одной из версий, сделали с JavaScript — использовали Java в названии, чтобы воспользоваться славой молодой и быстро набирающей тогда популярность Java), что Страуструп решил воспользоваться хорошим и сильным имиджем Plain C для того чтобы популяризировать свой язык. Ведь C++ — не первый, и на тот момент, не единственный ООП-язык.

Но теперь, рассмотрев историю, я понял (или выдвинул свою теорию), что C++ стал представляться надстройкой над Plain С, вместо того чтобы выбрать другое название и синтаксис, не для использования славы последней. C++ сохранил совместимость с Plain С для того чтобы хорошо работать с ней в паре. И дело не только в том, что мы можем линковать объектные файлы с обоих языков и легко импортировать функции и даже классы (это можно делать с любыми языками, из которых можно получить бинарно-совместимые объектные файлы). Дело в том, что мы можем использовать исходный код на Plain C в проектах на C++. То есть Plain C и C++ считаются родственниками (во всяком случае в POSIX) вполне оправдано. Огромная кодовая база, наработанная всем сообществом POSIX-разработчиков, после выхода C++, при необходимости (при пересмотре подхода от процедурного к ООП) могла быть использована. Я понял, что это была не попытка использовать славу Plain C, а обеспечение возможности сохранить огромное количество наработок. Это следствие того, что управление жёстким диском и принтером, перешло в управление жёсткими дисками и принтерами, то есть логичное отражение ситуации в мире компьютерной техники на средства разработки.

Выше я писал о путанице, которая «никуда не делась». Путаница заключается в том, что многие рассматривают C++ не как ООП-надстройку над Plain C, а как «улучшенную» Plain C. Автоматически подразумевая, что Plain C не полноценна и рассматривая её, как «язык предыдущей версии». Отсюда происходят много попыток впихивать C++ везде, где только получается. А получается далеко не всегда хорошо (см. ниже). Это имеет последствия и на рынке труда — часто разработчикам на C++ предлагают ощутимо большие зарплаты, чем на Plain-C-вакансиях. А это, в свою очередь, приводит к тому, что новички (студенты и все, кто хотят начать карьеру разработчика) выбирают для изучения C++ (и ООП, соответственно). Становится больше C++-разработчиков, что приводит к ещё большему увеличению соответствующих вакансий. Так этот порочный круг замыкается и превращается в «эпидемию ООП». Отчасти, более высока ставка для ООП-разработчиков оправдана тем, что ООП-языки изначально несколько сложнее изучить, особенно, если к C++ добавить, часто требуемые (и, кстати, так же часто применяемые без надобности) STL, boost. В последние годы к требованиям на C++ ещё добавляется Qt. А Plain C, вроде кажется проще, даже книга от её авторов такая маленькая — за что здесь платить? Разработчику на Plain C нужно платить не за знание «высоких материй» (за которые платят ООП-разработчикам), а за знание и навыки работы с системами — железо (процессоры, контроллеры, шины передачи данных, стандарты, аппаратные протоколы, сети), инструменты (анализаторы, осциллограф, генератор, тестер, иногда даже паяльник), системные API (POSIX, kernel space) и, что самое важное, за понимание того, как всё это связано. Но, как правило, даже когда в компании ищут системного разработчика с пониманием того, что он должен знать, это часто не учитывается при расчёте оклада.

Эпидемия ООП или примеры абсурда.
Разработчикам аппаратуры на SoC нужно подключить к железке экран. В беседе команда, состоящая из ООП-разработчиков, мыслит следующим образом: «Экран это тип устройства вывода, а устройство вывода это класс устройства. Сделаем класс экран, унаследованный от класса устройство вывода, который в свою очередь унаследован от класса устройство». Сделали один класс, от него унаследовали другой и так ещё несколько раз, а потом реализовали Singleton последнего (в реальной жизни экран, в виде семисегментного индикатора, на передней панели маршрутизатора подключен один). Все вышестоящие классы оказались невостребованными.
Ещё один пример. В Java (считается самым чистым ООП-языком на данный момент) всё является классом. Для того, чтобы впихнуть ООП-парадигму в реальную жизнь, пришлось придумать статическую функцию main () в открытом (public) главном классе, имя которого должно совпадать с именем файла. Статическая функция в Java — это функция, которую можно вызывать без создания экземпляра класса. Мне нравится Java, но костыль, который её разработчикам пришлось применить, говорит о том, насколько переоценен ООП и насколько он далёк от вычислительной техники.

На системном уровне и на железе (на встраиваемой технике) важно не отвлекаться на ООП-парадигму и не «витать в облаках», а работать как можно ближе к системе или к железу. А ООП-языки (даже C++) сильно «оборачивают» POSIX и железо. Знание POSIX в современном мире подобно воздуху — если ты понял POSIX, ты понял жизнь (во всяком случае её ВТ-составляющую). Пониманию и изучению POSIX'а (и вычислительной техники как таковой) помогает Plain C, Assembler, так как они «прозрачны» для вычислительной техники. В этом и есть реальная проблема, вытекающая из «путаницы», что я описываю. Переоценка ООП-подхода приводит к тому, что люди учатся ИТ, игнорируя вычислительную технику (ВТ), которая является основой современной техники. И последствием этого становится появление целого пласта профессионалов, вовсе не понимающих ВТ. А людей, понимающих ВТ, становится всё меньше. И в этом виноват не Б.Страуструп, не Э.Шмидт и не Б.Эккель, а весь рынок.

Итог таков:

Везде, где можно обойтись без ООП, нужно обходиться без ООП. Таких областей, возможно меньше, чем подходящих под применение ООП, но они есть. Моё мнение — это весь системный уровень, вся встраиваемая техника. Многие приложения так же могут быть успешно реализованы без ООП.
ООП нужно применять только хорошо подумав, и решив подходит ли ситуация под основной критерий для внедрения ООП — много объектов с похожим поведением и с необходимостью настраивать небольшую часть свойств и достаточно сложные связи между сущностями предметной области. И если у вас Web-сервер, поддерживающий много соединений, это ещё не обязательно повод для ООП.

Всё это конечно полезно для тех, кто способен (с точки зрения опыта и техники) и может (с точки зрения организационной) принимать такие решения. Тем, кто либо в силу опыта, либо в силу организационных моментов не может принимать такие решения, пока остаётся учиться и присматриваться.

Да будет всему своё место, но к ВТ, в любом случае, следует относиться с надлежащим уважением.

P.S.
Здесь так же хочу добавить пояснение. Не всем и не всегда понятно что такое ANSI C и Plain C и когда стоит употреблять тот или иной термин. ANSI C — это стандарт языка, Plain C — это термин, обозначающий концепцию, парадигму, процедурного подхода. Когда упоминают ANSI C, имеется в виду контекст стандарта (как правило его ограничения), когда говорят Plain C, акцент делается именно на парадигме программирования (процедурном) — как противопоставление C++. Любая ANSI C есть Plain C, но не любая Plain C есть ANSI C.
До 1988 года Plain C-код на разных компиляторах мог не собраться. Выражение ANSI C было популярным после выхода стандарта, в 1988 году, когда компиляторы стали «держаться ближе» к этому стандарту. Тогда было актуально, например, на собеседовании говорить, что вы работаете на ANSI C. Сейчас большинство компиляторов близки к стандарту, поэтому в основном, в речи уместно использовать термин Plain C (или просто «Си»). Так как во-первых, скорее всего, вы имеете в виду именно парадигму программирования, во-вторых, вы вряд ли знаете весь стандарт чтобы похвастаться знанием ANSI C (даже GCC реализует ANSI C с некоторым дополнениями).

1	.file "if-then-else.c"
2	.text
3	.globl main
4	.type main, @function
5	main:
6	.LFB0:
7	.cfi_startproc
8	movl $1, -4(%rsp)
9	cmpl $12648429, -4(%rsp)
10	jg .L2
11	movl $2, -4(%rsp)
12	jmp .L3
13	.L2:
14	movl $3, -4(%rsp)
15	.L3:
16	movl $4, %eax
17	ret
18	.cfi_endproc
19	.LFE0:
20	.size main, .-main
21	.ident "GCC: (GNU) 9.3.0"
22	.section .note.GNU-stack,"",@progbits

1	.file "ternary.c"
2	.text
3	.globl main
4	.type main, @function
5	main:
6	.LFB0:
7	.cfi_startproc
8	movl $1, -4(%rsp)
9	cmpl $12648429, -4(%rsp)
10	jg .L2
11	movl $2, %eax
12	jmp .L3
13	.L2:
14	movl $3, %eax
15	.L3:
16	movl %eax, -4(%rsp)
17	movl $4, %eax
18	ret
19	.cfi_endproc
20	.LFE0:
21	.size main, .-main
22	.ident "GCC: (GNU) 9.3.0"
23	.section .note.GNU-stack,"",@progbits

1	.file "if-then.c"
2	.text
3	.globl main
4	.type main, @function
5	main:
6	.LFB0:
7	.cfi_startproc
8	movl $1, -4(%rsp)
9	movl $2, -4(%rsp)
10	cmpl $12648429, -4(%rsp)
11	jg .L2
12	movl $3, -4(%rsp)
13	.L2:
14	movl $4, %eax
15	ret
16	.cfi_endproc
17	.LFE0:
18	.size main, .-main
19	.ident "GCC: (GNU) 9.3.0"
20	.section .note.GNU-stack,"",@progbits