DaftSoft: embedded

Showing posts with label embedded. Show all posts

16/12/2025

Barium No-Boot (6): exceptions (II)

Generate Data Abort exception

It's time to play. Let's generate Data Abort exception. Aborts of both read and write types are done in init_board() function (barium.c file) in exactly the same manner as mentioned above:


	/* Raise Data Abort Exceptions (Write and Read) */
	raw_writel(0, INVALID_ADDRESS);
	raw_readl(INVALID_ADDRESS);

And check the output:


	Exception: 04, ALU Core №: 0
	Class: DAF: FFFFFFFFFFFFFFFF
	Memory operation attempt: WR
	Adjusted ELR_EL3 forward (4)

	Exception: 04, ALU Core №: 0
	Class: DAF: FFFFFFFFFFFFFFFF
	Memory operation attempt: RD
	Adjusted ELR_EL3 forward (4)

We see exception group, ALU Core number, exception class (DAF — Data Abort Fault), address the attempt to access was made to (FFFFFFFFFFFFFFFF) and operation type — WR or RD.

Generate Data Alignment exception

To achieve this goal we'll write a small routine in assembly language — _fetch_mem (see it in a newly added file low_level.s) which looks like:


.globl _fetch_mem
_fetch_mem:
	# According to ABI x0 is the first parameter and 
	# the return value, thus all we have to do is just:
	ldr x0, [x0]
	ret

and is declared in barium.c file as:


	extern uint64_t _fetch_mem(uint64_t _Addr);

Let's test this function with some correct input value — our IMAGE_LOAD_ADDR is the good one:


	lVal = _fetch_mem(IMAGE_LOAD_ADDR);

And see what we get:


	Read Value: 24F239D584007AB2

We know that there should be ARMv8 opcodes. Let's check that by decoding the values we've just got:


	mrs x4, s3_1_c15_c2_1
	orr x4, x4, #0x40

This looks very familiar. Let's recall where we've seen this before — our very first instructions from start_64.s file:


.globl _start_64
_start_64:
	# Load CPU Extended Control Register:
	mrs x4, S3_1_C15_C2_1
	# Enable SMP:
	orr x4, x4, #(1 << 6)

This means that our _fetch_mem routine works properly and we can proceed to raising Alignment Fault exception. We'll do this with:


	lVal = _fetch_mem(0xDEAD);

And see what we get:


	Exception: 04, ALU Core №: 0
	Class: DAL: 000000000000DEAD
	Memory operation attempt: RD
	Adjusted ELR_EL3 forward (4)

	Read Value: ADDE000000000000

We see exception group, ALU Core number, exception class (DAL — Data Alignment Fault), address the attempt to access was made to (000000000000DEAD) and operation type — WR or RD.

Generate Invalid Instruction exception

For this purpose we have a small routine _udf (low_level.s):


.globl _udf
_udf:
	udf #0
	ret

The output we see is:


	Exception: 04, ALU Core №: 0
	Class: UNHANDLED: EC: 000000
	ELR Values: 0000000000921820
	ESR Values: 0000000002000000
	Adjusted ELR_EL3 forward (4)

Here we output some information but it's almost useless.

Generate exception by executing SMC instruction

After setting VBAR and implementing handler for SMC exception we can place SMC instruction in our code and it will generate exception and ALU will select corresponding handler and run it by just branching to its address. Let's review the SMC instruction itself. Its format is:


	smc #imm16

#imm16 means that instruction takes so called 16-bit «immediate value» — a value that can be obtained during compilation only — a number itself or a #define. In such case we cannot use register as a parameter to such instruction. ARMv8 instructions are 32-bit long and consist of opcode and its parameters. During translation of assembly code into machine code, assembler forms code of this instruction using specified immediate value. That's what we have about SMC instruction and its nature.

We can obtain all 16 bits of this value later in handler from exception syndrome — it was shown in our exception_handler() in previous part of this post. But how can we specify this parameter when we need to pass different values? For example, we want to pass some information via this parameter to our handler. It looks like this parameter is designed exactly for that purposes but it is not usable because it is «immediate value». Well, yes, it is probably not usable for that purposes. Of course, we can implement a big block switch/case of if/else which would look like:


	if (a)
		smc #1
	else
	if (b)
		smc #2

	...

	else
		smc #0xFFFF

But this would be enormous block of boring code. But we don't like big blocks of boring code and we used to do something exceptional in our posts. Okay, let's not make exceptions in this today. We will present a method to pass variable argument to SMC instruction at run-time and it will be a small piece of code.

How can we do that? We'll do this by forming out the SMC instruction itself with specified immediate value, write it to some memory address and execute it — we will implement a function that will do a part of assembler's work but in run-time. In case we need to perform SMC instruction we will branch to some function that forms SMC instruction and after that executes it. This is done in smc.s in _smc function. What is actually done here? First we form SMC 0 instruction — it will be the base for the one we need. Its opcode is D4000003h. Then we cut 16 bits off of parameter of _smc function (x0), shift it by 5 — this is the exact offset of #imm16 in SMC instruction, and add this value to base opcode we've formed above. That's all about forming opcode of SMC instruction with given #imm16 as a parameter. After that we store this opcode in address of _smc_instruction label. That could look like complete solution — just branch to (or fall to) _smc_instruction, but it would not work. And that is because of caches. We remember that we have turned them on already and that our application is so tiny that it fits in caches entirely. So, the code runs completely inside cache. Thus we need to force ALU to refetch our newly generated instruction. This is done by flushing caches — marking some memory address as outdated in cache. And now this is the last part. We flush caches and fall to our newly formed instruction (_smc_instruction) without branches because it is located right after _smc function. Below you can see the code:


.globl _smc
_smc:
	# Form instruction - smc with given immediate value.
	# Form instruction SMC 0 - the base for desired one,
	# its opcode is D4000003h:
	mov x1, 0x0003
	movk x1, 0xD400, lsl 16

	# Ensure we have exact amount of bits we need - immediate is 16 bit long:
	and x0, x0, 0xFFFF

	# Shift the immediate value to position it takes in instruction:
	lsl x0, x0, #5

	# Put the immediate value into instruction code by orring:
	orr x0, x0, x1

	# Obtain the address we want to modify:
	ldr x1, =_smc_instruction

	# Write new instruction to destination address:
	str w0, [x1]

	# As we have caches enabled we have to mark memory region that 
	# contains our newly generated instruction as outdated to 
	# force ALU to refetch the instruction:
	adr x1, _smc_instruction
	# Flush (invalidate) D-caches:
	dc cvau, x1
	dsb ish
	# Flush (invalidate) I-caches:
	ic ivau, x1
	dsb ish
	isb

Now let's review the template of _smc_instruction function. Here we have a SMC instruction with base immediate value, which we took as 0. What we have to keep in mind here is that we still are in function at the moment — we didn't branch to it, but we've fallen to section _smc_instruction which was generated by _smc function from the last one. _smc_instruction contains our SMC instruction. After executing it, the ALU will run exception handler and, after returning from it, will run the instruction immediately following SMC. Thus we have to add ret instruction as part of _smc function.

Any function has some return value. What could we return from _smc? It could be interesting to return the opcode of generated instruction. That's exactly what we'll do. But we will not do any additional moving of data here because we already have our SMC opcode in x0 — which is the register that contains return value of a function (according to ABI). You can see this section below:


	# Fall to newly formed instruction:
_smc_instruction:
	# smc with default immediate value:
	smc 0x0000
	# We keep in mind that we are still in function (_form_smc), which 
	# is called from C-code. Thus, have to put ret here. Also we use the 
	# return value for reviewing of instruction code we've generated. 
	# At this moment it is stored in w0, thus we do not move any values.
	ret

Now we have code that causes exception — SMC with immediate value as given parameter. So it's time to put call to this routine somewhere with some parameter (immediate value). What could be interesting? How can we put it all together to get a nice result? We'll use symbols we get from UART as an immediate values for SMC and output its opcode:


loop_uart:
	/* Read from UART and output received bytes in a loop */
	lVal = uart_get_char(uart_base);
	/* Form exception and raise it */
	lVal = _smc(lVal);

	uart_output_string(uart_base, "Instruction opcode: ");
	uart_output_hex(uart_base, (uint8_t*)&lVal, 0, 4);
	uart_output_string(uart_base, "\r\n\r\n");
	goto loop_uart;

Here's what we have as a result:


	Exception: 04, ALU Core №: 3
	Class: SMC, #imm Value: 0020
	Instruction opcode: 030400D4

First two lines are from exception handler. The first of it outputs exception group, and core number on which exception occurred. The second outputs exception type (class) and its immediate value. Third line if from barium_main() function — it outputs the return value of _smc function which is the opcode of instruction we've generated.

We have collected initial VBARs from all ALU Cores. Let's see what we have:


	ALU Core №: 0
	Vector BAR: 0000000000000000

	ALU Core №: 1
	Vector BAR: 0000000000000000

	ALU Core №: 2
	Vector BAR: 0000000000000000

	ALU Core №: 3
	Vector BAR: 0000000000000000

What we see here? VBAR initially is set to zeros for all cores, as stated by datasheet (p.706 — Figure 6-2. Internal ROM and RAM memory map) — vectors are located at zero.

That's all for today. We've reviewed the basics of exception model of ARMv8 machine with templates and some practice. As we've mentioned in first part of this post — there is much more about exceptions and it's up to you to learn further.

You can clone the final repository from Barium No-Boot (iMX8MP) (see «Stage IV» directory).

26/09/2025

Barium No-Boot (6): exceptions (I)

Preface

Well, today we are going to continue to learn the ARMv8 machine.

One day we've started with «one-dimensional» product — one core executing straight, linear code. In previous post, we've added new, second «dimension» — we've made the product multi-core. Now it has four ALU cores running independent code simultaneously, in parallel. This time, we will add another one dimension (or at least a half of a dimension) to our product. We will review one important concept of all computing machines, and ARMv8 is not an exception — exceptions.

Theory

Let's start with the theory. The conception of exceptions consists of three parts: cause (in terms of ARMv8 — «syndrome»), handler (a code that processes arised exception) and a entity that binds cause and handler (in terms of ARMv8 — «Vector Table»).

The handlers, despite that they are divided in groups, from the technical point of view are absolutely identical for all exceptions. Vector table, the entity that associates exceptions with their corresponding handlers, is just a 2kB of aligned in a certain way code, divided in 16 equal-sized blocks. Each of that blocks consists of 32 ARMv8 instructions. As we can see, Vector Table is binding entity by its structure (strictly regulated by ARMv8 standard) and a set of handlers by its content.

The causes or, generally speaking — exceptions themselves, are of two types — asynchronous and synchronous. Asynchronous are those what take place when, let's say — some «external» event occurs. Interrupt is a good example of asynchronous exception — you never know when an interrupt will occur while you are writing code. Synchronous exceptions are those what arise immediately after some instruction is executed or, in some cases, somewhere inside of execution process. In other words, synchronous exception is just a reaction to instruction. It can be arised by an instruction that caused some error — the situation when the machine can't continue to function normally without handling the error. An example of such an exception is a case of illegal instruction — situation where a fetched instruction could not be decoded (and executed). Attempt to access invalid address of memory is another example of synchronous exception. «Handling» such errors implies analysing condition of computing machine and an attempt to fix that state prior to allowing the code to continue to run, or, in worst cases — preventing the machine from running further code by (usually) leaving it in an infinite loop. And at last, there is a set of synchronous exceptions that are designed not to handle errors, but to serve regular, normal duties on a developers purpose. Synchronous exceptions will be the topic of our today's post — we'll review Data Abort and Data Alignment exceptions as examples of error/fault and so called System Monitor Call (SMC instruction) as example of exception for developers purpose. We chose these as subject of this post.

Let's review theory of the exact exceptions we are about to practice on.

The first — Data Abort. This exception arises on an attempt to access to a non-existent memory address. Data Abort exception can be generated easily with following:


	raw_writel(0, INVALID_ADDRESS);


	raw_readl(INVALID_ADDRESS);

where INVALID_ADDRESS can be defined, for example, as:


	#define INVALID_ADDRESS	(UINT64_MAX)

The ARMv8 architecture allows to determine if it was an attempt to read or to write, as well as address an attempt to access was made to. We'll review this functionality in code later.

Data Alignment exception can be caused by an attempt to load or store from or to an unaligned memory address by executing, for example:


	ldr x0, 0xDEAD

The simplest way to generate Illegal Instruction exception is:


	udf 0

Or assembly illegal instruction and try to execute it. In our practice part of this post we will assembly DEADBEEFh opcode, which does not represent any valid ARMv8-A instruction.

SMC is another one synchronous exception and from the developer's point of view looks like a regular call (in terms of ARMv8 — branch) to a normal function, because its handler will be executed immediately after this instruction and before executing the next one and doesn't require to fix any things (because it is not a reaction to any kind of error or fault). It is good as a case of exception to learn on ARMv8 and to play with.

Thus, here is our plan for this post:

Construct Vector Table
Configure ALU to use our Vector Table
Design handler for Data Abort exception
Design handler for Data Alignment exception
Design handler for Invalid Instruction exception
Design handler for SMC exception
Generate Data Abort exception
Generate Data Alignment exception
Generate Invalid Instruction exception
Generate exception with SMC instruction

Let's get it started and run through the plan (in two parts).

Construct Vector Table

The Vector Table is a block of regular ARMv8 code 2kB in size, split in 16 sections 128 bytes in size each. The placement of Vector Table also must be aligned to 2kB boundary. We will place our Vector Table in a separate file — vectors_64.s. It is aligned to 2kB and consists of 16 section aligned to 128 bytes. The minimal acceptable template of Vector Table of ARMv8 could consist of header like:


.arch armv8-a

.balign 0x800
.globl _vectors_el3
_vectors_el3:

and 16 equal blocks of handlers that look like:



.balign 0x80
	bl some_exception_handler
	eret

In our case, we have default handler as shown below:


	stp x0, x1,   [sp, #-16]!
	stp x2, x3,   [sp, #-16]!
	stp x29, x30, [sp, #-16]!
	bl _exception_entry
	bl exception_handler
	ldp x29, x30, [sp], #16
	ldp x2, x3,   [sp], #16
	ldp x0, x1,   [sp], #16
	eret

Here we preserve registers that we'll use, call local function _exception_entry and global routine exception_handler() (located in exceptions_64.c file). Where _exception_entry is declared as:


_exception_entry:

	# Calculate Exception group No. as first parameter of default handler:
	# Exception Group No. = (instruction address - _vectors_el3) / 80h
	# We get instruction address from the Link Register as it points to 
	# address of instruction next to bl, which lead to this function, thus 
	# it represents address inside of an exception group in Vector Table:
	mov x0, lr
	ldr x1, =_vectors_el3
	sub x0, x0, x1
	mov x1, 0x80
	udiv x0, x0, x1

	# Get core number from MPIDR and store it as second parameter:
	mrs x1, MPIDR_EL1
	and x1, x1, 0xFF

	# Pass Exception Link Register as third parameter:
	mrs x2, ELR_EL3

	# Pass Exception Syndrome Register as fourth parameter:
	mrs x3, ESR_EL3

	ret

In this function we prepare some data for real exception handler: gather information for later review — calculate the number of exception group, get number of ALU core on which exception arised, exception link register and exception syndrome register. exception_handler() will be reviewed later.

Configure ALU to use our Vector Table

Vector Bar on ARMv8 is set up by writing its address to VBAR_EL3 register. VBAR stands for Vector Base Address Register. This is done in crt0_64.S in a newly appended function _set_vbar_el3 which is called from _crt0_main_64 function. Here we store the initial value of VBAR_EL3 register as fourth parameter of barium_main() for later review. The rest is simply and clear here:


_set_vbar_el3:
	# Store initial RVBAR as fourth parameter:
	mrs x3, VBAR_EL3

	ldr x7, =_vectors_el3
	msr VBAR_EL3, x7
	dsb sy
	isb
	ret

Design handlers for Data Abort, Data Alignment, Invalid Instruction and SMC exceptions

The whole exception_handler() routine is depicted in list below:



void exception_handler(uint64_t _EG, uint64_t _CoreNo, uint64_t _ELR, uint64_t _ESR)
{
	uint64_t lVal;
	uint16_t lEC;
	uint8_t lDFSC;

	uart_output_string(uart_base, "Exception: ");
	uart_output_hex(uart_base, (uint8_t*)&_EG, 0, 1);
	uart_output_string(uart_base, ", ALU Core №: ");
	uart_output_dec(uart_base, _CoreNo);	
	uart_output_string(uart_base, "\r\nClass: ");

	/*
	 * Get Exception Class (EC) field - it is [31:26] bits of ESR_EL3:
	 * EC, bits [31:26], ARM DDI0601 (ID092025), p.693
	 */
	lEC = (_ESR >> EC_SHIFT) & EC_MASK;

	if ((lEC & EC_SMC) == EC_SMC)
	{
		uart_output_string(uart_base, "SMC, #imm Value: ");

		/*
		 * Get immediate value - it is [15:0] bits of ESR_EL3:
		 * imm16, bits [15:0], ARM DDI0601 (ID092025), p.712
		 */
		lVal = bswap_64((_ESR & SMC_IMM_MASK));
		uart_output_hex(uart_base, (uint8_t*)&lVal, 6, 2);
		uart_output_string(uart_base, "\r\n");
	} else
	/* Data Abort & Data Alignment Faults */
	if ((lEC & EC_DATA_ABORT) == EC_DATA_ABORT)
	{
		lDFSC = _ESR & DFSC_MASK;
		if (lDFSC == ISS_DAB_FAULT)
			uart_output_string(uart_base, "DAF: ");
		else
		if (lDFSC == ISS_DAL_FAULT)
			uart_output_string(uart_base, "DAL: ");
		else
		{
			uart_output_hex(uart_base, (uint8_t*)&lDFSC, 0, 1);
			uart_output_string(uart_base, "h: ");
		}
		/*
		 * Data Abort Exception leaves Exception Link Register pointing to 
		 * instruction that caused Data Abort. For testing purposes we adjust
		 * ELR_EL3 to make it pointing to next instruction
		 */
		lVal = bswap_64(_get_far_el3());
		uart_output_hex(uart_base, (uint8_t*)&lVal, 0, 8);
		uart_output_string(uart_base, "\r\nMemory operation attempt: ");
		uart_output_string(uart_base, _ESR & (1 << ISS_DA_WNR_BIT) ? "WR" : "RD");		
		uart_output_string(uart_base, "\r\nAdjusted ELR_EL3 forward (4)\r\n");
		_adjust_elr_el3(4);
		uart_output_string(uart_base, "\r\n");
	} else
	/* Unhandled Exception */
	{
		uart_output_string(uart_base, "UNHANDLED: EC: ");
		uart_output_bin (uart_base, _ESR, EC_SHIFT, EC_LENGTH);
		uart_output_string(uart_base, "\r\nELR Values: ");
		lVal = bswap_64(_ELR);
		uart_output_hex(uart_base, (uint8_t*)&lVal, 0, 8);
		uart_output_string(uart_base, "\r\nESR Values: ");
		lVal = bswap_64(_ESR);
		uart_output_hex(uart_base, (uint8_t*)&lVal, 0, 8);
		uart_output_string(uart_base, "\r\nAdjusted ELR_EL3 forward (4)\r\n");
		_adjust_elr_el3(4);
		uart_output_string(uart_base, "\r\n");
	}

	return;
}

Let's take a closer look at functionality of this routine. First, it outputs Exception Group — the exact number of a section in Vector Table we've got our exception to (or from — depends on point of view). Second, it outputs number of ALU Core on which exception arised. After that goes parsing of the data we've gathered in our _exception_entry routine and output of information about exception being arised. The information used in this analysis is stored in ESR — Exception Syndrome Register. We get Exception Class field of ESR. This is the top-level starting point in handling of any exception — all exceptions are divided and distinguished from each other by EC. The first exception that is processed is SMC. We output its immediate value. We'll examine the nature of SMC instruction a little later. The second exception being parsed is Data Abort. If you've skimmed through the text of exception_handler() routine, you could notice that there is no Data Alignment section in EC. That's right. Data Abort and Data Alignment both belong to one EC. We decide if it Access or Alignment fault by another field of ESR — DFSC. DFSC stands for Data Fault Status Code. By examining this field we know if it was Access or Alignment fault. But what we want to see if we get Data Access or Alignment fault? We want to see two things — the address the attempt to access was made to and the type of access — was it read or write. Thus, here we get FAR — Fault Address Register, register that contains the exact memory address the software attempted to access. And via Write Not Read bit field we get type of operation that lead to exception. Usually, both Data Abort exceptions are used to correct things — to get page from swap, for example. Because of that, ARMv8 ALU leaves Exception Link Register (the register containing address where the program flow will continue after exception handler exit) containing address of the exact instruction that was trying to access memory instead of address of next instruction. We should fix problems with memory and return from exception and ALU should re-execute the same instruction. But in our case, for our small learning and researching purposes, we'll solve this problem in a different manner. As ARMv8 has fixed instruction length we just adjust the ELR by 4 bytes (_adjust_elr_el3 — simple routine in assembly language). And after that manipulations the flow will continue (with some values received as result of memory access operation), while without adjusting ELR we would get into infinite loop of instruction and this exception. But where is the Invalid Instruction? On ARMv8 it falls to so called unhandled exception and we have little to display and to fix in this case. Just to let our code make its way ahead we adjust ELR forward. That all about exception handler. There is a lot more to explore, you see titles of documents and pages in comments — it's up to you to learn and play onward.

That's all for today. In next post we'll continue to learn and practice exceptions on ARMv8 machine.

12/02/2025

The real «Hello World» from embedder (3): result, play — the birth of Barium No-Boot!

Chapter 3 Result, play — the birth of Barium No-Boot!

Finally...

Well, we've discussed goals and benefits of bare-metal development, have gathered all the information we need to work on a specific SoC, and now we are ready to start writing code. In this post, we will practice. We will go thru the whole process of bare-metal coding, compiling, linking, stripping and, finally, will get a real stand-alone application for a real specific ARMv8-A SoC. Can you imagine — it will be slightly bigger than 1kB in size (including all necessary headers)! But this tiny sprout of digital life, as a result, will perform the minimal task — output strings via UART. And even more — it will read from UART and output those bytes! Also we will add some functionality to our application, do some interesting stuff and perform some experiments.

The plan

The plan we need to carry out to reach the final result looks like:
1. Organise our code — .s and .c files.
2. CPU start code — assembly code.
3. Initialise the stack pointer — assembly code.
4. Initialise PLLs, clocks, pads, UART hardware-block — C-code.
5. Write some payload code — write to and read from UART.
6. Compile and link our code into binary file.
7. Make our binary acceptable by our SoC as boot-image.
8. Put our boot-block in a proper place.
9. Configure the board for booting from media we need.
10. Power on and play!

Hands-on code

Let's get it started.

1. Organise our code — .s and .c files.

This is bare-metal — we don't have bootloader and/or kernel behind us. Thus, there are some routines we have to do in assembly language before writing any C-code. So, we will have assembly files. Actually two. They can be combined into a single .s-file, but we will stick to traditions. These files are (common names on 64-bit platforms) — start_64.s and crt0_64.s. But what exactly should be done in assembly files (language) on bare-metal? The answer is — the things we can't do in C. We can access registers of hardware-blocks via volatile pointers as this is done via so-called memory-mapped input/output (MMIO). But we can't access CPU registers in C directly, thus start_64.s file contains the very first code that cannot be done in C. Specifically CPU start-up code — applying erratas, switching modes, exception levels, setting interrupt vector tables, etc. Now let's go on with crt0. You probably are familiar with it (or, at least, heard of it) — so called «C-runtime zero». In user-space, usually, this is some object file, which is implicitly added to our code by linker prior to file containing main() function written in any high-level language. In user-space crt0 prepares argv and argc which are passed to developer's main() function, makes pointers to environment variables code starts with, etc. As you can assume, this file is unnecessary if we are about to write in assembly language only. Its content is to prepare the environment (be it a user-space program or bare-metal code) for C-code, in our case — set stack pointer to some valid address. After all necessary assembly code routines are done, we will break out to C-code and realise application payload — read from and write to UART. Assembly language files are for startup code and stay as the window to ARMv8-A architecture, our sandbox — one of our goals.

2. CPU start code — assembly code.

CPU start code. In this example we don't need to do anything here, but let's save some values for further research — (believe me) it'll be interesting to explore. According to AAPCS64 (AArch64 Procedure Call Standard — routine calling convention for 64-bit ARM machines), x0 will be the first parameter of called function, x1 — the second, x2 — the third, etc. Let's save initial program counter (PC), stack pointer (SP) and exception level (EL) while they are intact by our code — at the very beginning of it. We will save initial PC, SP and EL by storing them in x0, x1 and x2 registers. Later you will see how they will get into our main C-function and will be output to UART. So, start_64.s will look like:


.arch armv8-a

.globl _start_64
_start_64:

	# Store the address of the very first instruction. This will be the
	# address BROM puts our code to. According to AAPCS64 x0 will be the
	# first parameter of called function. So, save current PC to x0:
	adr x0, .

	# Store initial stack pointer address as the second parameter:
	mov x1, sp

	# Save Current EL to x2, and it will get to third parameter:
	mrs x2, CurrentEL

	# Branch to crt0's main:
	b _crt0_main_64

3. Initialise the stack pointer — assembly code.

In our case, there is the only thing we have to do in crt0 to get prepared for C-code — set stack pointer. Stack pointer is set by just writing memory address value to SP register. We want our x0, x1 and x2 registers to pass thru this file to C-function, so we avoid using these registers here. So, crt0_64.s will look like:


  .arch armv8-a

  .globl _crt0_main_64
  _crt0_main_64:
	# Set stack pointer at the top of OCRAM 97FFFFh 
	# Internal ROM and RAM memory map (NXP IMX8MPRM, Figure 6-2, p.706):
	# Move the address to x3 without last Fh (aligned to 16):
	mov x3, 0xFFF0
	movk x3, 0x0097, lsl #16

	# Set stack pointer:
	mov sp, x3

	# Finally, break out to C-code:
	b barium_main

From here on, you will see links to documents in brackets. The format is as follows: (document name, section, table, figure name or number, page number).

4. Initialise PLLs, clocks, pads, UART hardware-block — C-code.

After that, we have to initialise PLLs, clocks, pads and UART controller.

We begin with ccm_plls_init() function. The only thing we have to do here is to enable div 5 of PLL2 because we decided to drive UART HW block via PLL2 div 5 output. Therefore, this function looks like:


  void ccm_plls_init(void)
  {
  	/* Enable div 5 output of PLL2 as we dicided to clock UART via this output.
  	 * Actually, this is unneeded - it appears to be enabled. */
  	setbits_le32(CCM_ANALOG_SYS_PLL2_GEN_CTRL, PLL_DIV5_CLKE);

  	return;
  }

After that we proceed with uart_init() function:


	uart_init(uart_base, UART_BAUD_RATE);

The most interesting part of this routine is the initialisation of pads, which looks like:


	...

	/* Set UART2 TXD Alternative function */
	iomux_set_pad(UART2_TXD_ALT, UART2_TXD_MUX);
	/* Set UART2 TXD Pad properties */
	iomux_set_pad(UART2_TXD_PAD, UART2_TXD_MUX + ALT_PAD_STEP);

	/* Set UART2 RXD Alternative function */
	iomux_set_alt(UART2_RXD_ALT, UART2_RXD_MUX);
	/* Set UART2 RXD Pad properties */
	iomux_set_pad(UART2_RXD_PAD, UART2_RXD_MUX + ALT_PAD_STEP);

	/* Select UART2 RXD input (Daisy Chain) */
	iomux_select_input(UART2_RXD_DSY, UART2_RXD_SEL);

	...

As we discussed earlier, we set alternative functions of pads, their properties and input type of UART2 RXD pad — its so-called «daisy chain».

The rest of uart_init() function initialises UART hardware block. This is not interesting enough to list it here. You can see it in the code.

5. Write some payload code — write to and read from UART.

Payload code. Since we have initialised all the hardware and have some high-level functions, the code becomes self-explanatory. The only thing I want to explain is the uart_chars_counter variable and UART_FIFO_MAX define. On each output we increment special counter and when it reaches maximum length of FIFO we wait for real transfer to finish by waiting for TXDC bit in UART Status Register. We have to check this because without this we can get in situation where UART can skip symbols being output. This is done in uart.c file. You can see the payload functionality in barium_main() function located in barium.c file.

6. Compile and link our code into binary file.

Compile and link. Compilation and even cross-compilation is not something new — as usual, we compile separate source files, link them in a single ELF-file then perform the only extra step — dump pure code of it into a binary file. You'll see a linker script in the repository. Actually, you can build the project without it. I've used it just to remove some unneeded section that GCC adds to ELF. But we should bear in mind that this is bare-metal — that means that machine starts from the very first instruction it finds in our code. Actually, BootROM branches to the address it loads our application to. Thus, we have to put our start function in the very first place. This is done by respecting the order of object files when linking to resulting code. In our case it is done via PRODUCT_SRC variable in our Makefile. Thereby, we set the file containing our starting code in a first place and the function we want SoC to start as the first function in that file.

7. Make our binary acceptable by our SoC as boot-image.

Making binary acceptable by a SoC is very important step, but I'll not bother you with all that details — what information, where and how to find it to know the fields we have to fill for our SoC. I've made host tool named mkbb_imx8. It is derived from NXP's mkimage_imx8. It gathers some information, fills necessary structures and writes them to proprietary header and adds it to specified binary. The only interesting thing here is the address to load image to. This field tells BootROM where to put loaded application (and jump to after loading it). In my tool it is defined as DEFAULT_LOAD_ADDR in source code or can be passed as third, optional parameter to mkbb_imx8. Command line parameter takes priority over define. If you ever need to develop such tool for particular SoC, you can follow the same way as I have made. No tricks here, all you need — just to find the structure of proprietary header for specific SoC. Usually this information can be found in vendor's BSP build system — host tool, often named mkimage_XXX. You will see mkbb_imx8 code in separate directory in the repository.

8. Put our boot-block in a proper place.

The result of mkbb_imx8 is a binary file which will be accepted by iMX8MP BootROM as boot-image. The last thing we have to do on our host machine is to put our boot-block in a proper place. iMX8MP expects it to be on media with offset of 32kB. Thus, we will write our application to SD-card with the following command:


  dd if=./barium of=/dev/sdX bs=1k seek=32 ; sync

I said it'll about 1kB in size. The code (with its data) I've got with my compiler (ARM GNU Toolchain 14.3.Rel1) is 980 bytes, and it appeared to be 1044 bytes with iMX headers. Maybe we've made one of the most beautiful kilobytes all over the world...

9. Configure the board for booting from media we need.

If you are reading this, perhaps, you have some board (proprietary or kit) with iMX8MP SoC. If so, you may already have it configured to boot from SD-card or already know how to do it. If not, you need to configure the board for booting from SD-card. From iMX8MP SoC perspective boot device selection is described in section 6.1.5 — «Boot devices (internal boot)» (p.713). Refer to your product manual to see how it is done on a particular board.

10. Power on and play!

Insert SD-card into your board, connect your UART converter to UART2 interface of the board, start your favourite serial communication program and power on (or reset) the board. You'll see:


  Barium No-Boot V0.1 (iMX8MP)
  Build: 18:00:00, Feb 12 2025
  Initial PC: 0000000000920000
  BootROM SP: 0000000000916ED0
  Current EL: 0000000000000003
  Awaiting commands from UART:
  Received: 20h

What's about naming? «Barium» stands for «Bare», because barium, besides consonance with «bare», is a metal element. Thus, bare-metal is replaced with «Barium». «No-Boot» means it does not boot any kernel or anything at all (but runs when and where boot loader usually does).

Well, what we see here. First, initial PC is 920000h. This is our DEFAULT_LOAD_ADDR (in our Makefile address is passed via command line argument). You can play with it, set it here and there according to OCRAM memory map (p.707). Datasheet claims that OCRAM free area starts from 918000h, but the lowest address I've managed to put application to is — 920000h. BootROM does not allow to put it to 918000h.

But what is more interesting — the stack pointer BootROM leaves for us. It does not allow to place our application in its reserved area, but leaves SP at 916ED0h. This can be used — we can leave SP intact and have stack from 916ED0h to 900000h which is pretty enough space and is actually restricted for our code by BootROM — unusable anyway.

And we see that we run in third exception level, EL3 — the highest one. We have carte blanche to do whatever we want on this machine.

What conclusion can be drawn from this? First Platform Loader (BootROM) of iMX8MP leaves SoC in condition where we can omit any initialisation which is usually done in assembly language — as we mentioned earlier, stack pointer is already set to some valid address. While setting stack pointer is the only crucial thing we have to do in assembly code for such simple example, we can skip this step by excluding all .s-files. Nothing else should be changed, just don't forget to set main .c-file (in our case — barium.c) as the first one in your objects list. And main function (in our case — barium_main()) should be the first function in this file. Keep in mind — this is bare-metal, and the first function in your code is the entry-point regardless what ENTRY() linker directive points to. (You won't even see it in our .lds-file at all). Despite our goal was exact opposite, the whole bare-metal project can work without a single .s-file on iMX8MP! You can try it. Thus, we can call iMX8MP very high level SoC, or some kind of a «C-SoC».

You can clone the final repository from Barium No-Boot (iMX8MP) (see «Stage I» directory).

That's all. Finally we have a good window to learn ARMv8 aka ARM64 aka AArch64 on a real hardware — we have assembly files and C-files to play with, we have build system for this project — Makefile, linker script, we have host tool to make proper boot image, and we have some kind of a debug interface — UART. That looks like sufficient setup for further studying.

17/12/2024

The real «Hello World» from embedder (1): preface, theory

Chapter 1 Preface, theory

Preface

Any learning process eventually comes up to some examples, some practice. In the case of software development, practice starts from so-called «Hello World». «Hello World» is the very first code every software developer on every platform, API, or framework should do on his very first days as his very first steps. Embedded software engineers are not an exception.

There is a lot of «Hello Worlds» out there on the Internet and in books. Those «Hello Worlds» are in any kind of programmings languages, APIs, and frameworks. But the vast majority of those «any kinds» are in pure software — Java, JS, C++, etc. Thus, embedded software developers, those of them who want to study machine architecture, get in touch and play with it, suffer from a lack of examples — they have no starting point.

You can argue with that — there are examples in assembly language for all types of architectures, assembly language should let us study machine architecture and its behaviour. Yes, there is a lot of examples in assembly language, but those examples are for very high (more precisely — top) levels of runtime environment — Linux and Mac user-space. This approach to learning machine architecture doesn't lead to understanding of how it works and how it is organised.

Actually, it gives us a very narrow slit to machine architecture. Even in kernel-space, we are very limited in abilities to learn the machine. That's because all the hardware initialisation is already done for (and before) us by bootloader and kernel itself. The second problem is so-called «concepts and abstractions of operating systems». Those «abstractions» hide a lot of machine architecture from us while «concepts» arise a lot of software that serve an OS itself and make it what it is from user's point of view. Thus, even working in kernel-space, we learn more about these concepts and principles rather than about a machine.

Let's go on and discuss our last chance — MCUs (Micro-Controller Unit), which are cheap, popular, easy to start up and do let us be as close to the machine as possible. MCUs always had a crucial difference with full-fledged CPUs and SoCs (System on Chip) — they have no MMU (Memory Management Unit) and usually are single-core ALUs. After the beginning of the ARM64-era — the times when not only phones and tablets but also desktops (and even servers) and laptops are running on ARM64, MCUs eventually have got another one drawback — ARM32 (the most common architecture of worth anything at all MCUs) architecture became outdated even as a single-core platform for studying. Now we are living in the days when it's almost useless to learn ARM32 architecture. It's more a waste of time because the difference between actual and almost omnipresent ARM64 and rapidly becoming obsolete ARM32 is enormous.

Thus, we come to the point where we need a comfortable, big enough window (let's say — better a door or a nice gate) to the machine that represents modern ARM64 architecture. Usually it is done by programming for an emulator. Developing for an emulator is like a game because of its abstraction. Games (and gaming) give you minimal or even zero risks by the price of insignificant wins. To get real valuable practice and results, as well as satisfaction, we need a real «Hello World» on real hardware.

Learning a machine architecture (and its behaviour) is done by so-called «Bare Metal» development. «Bare» means that we are working on a machine without an OS, without a kernel, and even without a bootloader. Actually, we develop code that works instead of a bootloader or runs where and when the very first part of user bootloader (SPL — Second Platform Loader) usually does. «Metal» stands for hardware or machine and means that we work on a real hardware — not a virtual machine or an emulator.

In these topics we'll cut a good window for studying modern ARM64 machine. This will be the «Hello World» from embedded engineer and a nice toy for embedded developer.

Theory

Let's start with a little overview of a computing machine, specifically about its heart — CPU or SoC. SoC provides a set of functions (functionalities); each functionality is provided by an according and separate hardware-block. Thus, SoC consists of hardware-blocks or represents a set of hardware-blocks. Each hardware-block is driven by a clock; each clock is derived from PLL, Phase Lock Loop — a small circuit that generates one or more frequencies from an input clock source — an oscillator. So, turning a hardware-block on, besides powering it, is just enabling the clock it is connected to. Actually, you will not find such element like «clock» on your board. «Clock», as an element — is just a concept of embedded developers. Really, «clock» — is one of the outputs of PLL, which can be enabled or disabled by software. Real clocking sources are oscillators and outputs of PLLs. You can see overview of clock generating scheme below.

Clock generating scheme

But enabling clock is not enough to start using functionality of hardware-block. In most cases there are two more things. The first — usually, SoCs provide more functionalities than they have pins, or, in terms of embedded software, more often we say — pads. It leads to so-called «alternative function» (AF) concept. This means that some pads, as we say in software theory — can be configured to specific (alternative) function — I2C SDA or UART TX, for example. Configuring of AFs is done by Input/Output Multiplexing Controller (IOMUXC) or, more commonly called Pin Multiplexer (PINMUX). PINMUX is a built-in hardware-block. So, it has to be turned on (clocked) too, like other hardware-blocks. From the hardware point of view, PINMUX simply routes signals from SoC's pads to inputs/outputs of specified internal hardware-block — to I2C bus controller or to UART controller in example above. In other words, PINMUX switches signals between hardware-blocks inside of a SoC on one end and SoCs pins/balls (or pads) that we can see on the chip package. You can see example of alternative functions multiplexing scheme below.

Alternative functions multiplexing scheme

Second thing is configuration of hardware-block. This is done by code. After hardware-block is clocked (enabled) it stays in some default condition. We need to write specific values to specific registers to make it function according to our needs.

Clocks for hardware-blocks and corresponding PLLs are enabled by code. Hardware-blocks are configured by code too. But no program can run on a non-clocked CPU. So who and how starts the main PLLs and clocks that drive the ALU? This functionality is hardcoded into so-called BootROM, or BROM. BROM is microcode of SoC itself; it is located inside SoC. BROM initialises the essential hardware minimum required to start user code. It starts up a minimal amount of hardware-blocks and after that performs as FPL (see below). BROMs are very different in functionality — some of them just start code from built-in memory, not even loading it into RAM (like BROMs of MCUs do), some of them initialise a bunch of hardware-blocks and can even operate filesystems (like BROM of Raspberry's Broadcom does). The most common condition BROM leaves SoCs in is: one ALU core is started, SRAM (or OCRAM — OnChip RAM) initialised, and one of the boot sources initialised. Initialisation of the rest functionality is left for user code.

Any SoC uses its own, proprietary FPL (First Platform Loader). FPL is a part of BROM we've discussed earlier. To run our code on a SoC we need so-called «boot image», which is made out of our code. We need to say few words about contents of boot image. It is not a regular file we get from compiler. And the architecture mismatch is not the only reason. Usually compiler builds ELF (Executable and Linkable Format) file. ELF-file contains a lot of extra information which is used by OS or some other environment code is run in. For example, ELF-file can include debug information, symbol names, etc. But while working on bare-metal we have to get rid of all OS-specific, debug and other environmental information. This is because SoC will execute raw code only and treat any data as straight, linear stream of code (with some addition of data). Bare-metal needs raw code to function correctly. If we try to load whole ELF-file as a bare-metal application, most probably we'll get some kind of illegal instruction exception because that extra information in ELF-file will not match with correct instructions codes. The process of making raw code, or raw binary from ELF-file is called stripping.

Now, let's proceed with FPL. Any FPL expects boot-image in a specific format and in a specific place. Thus, after we've compiled our source code to (set of) object files, linked that objects into single binary, and stripped it down to pure machine code, we have to pack it and make it comply with specific requirements of a particular SoC and its BROM. This is done by a tool, which usually is named «mkimage», sometimes like «mkimage_XXX», where XXX is replaced by the name of SoC or the name of a family of SoCs. Usually, mkimage adds some headers, which BROM reads, and sometimes CRCs, to machine code. BROM uses this information to verify the boot image. Then we have to put this image in a particular place — usually on a SD-Card or eMMC with some offset from zero. Offset is needed to make bootable media also usable as storage media — it leaves space for partition table and filesystems.

This is what we have about hardware. Let's proceed to software. There's not too much to do and discuss here. We can work in assembly language forever — this is interesting and can be useful. By limiting ourself to assembly language only we can omit using stack. But anyway, at some point we want to dive out into C-code. At this point we'll need stack because C-compiler will use it intensively. Thus, we have to initialise it — that's all we need to know at the moment. Initialisation of the stack is done by setting the stack pointer register to a value representing a valid memory address. Stack (more often) grows down, so we have to choose an address for it according to its behaviour — to prevent it from touching the bottom of RAM and from destroying our application in case it is set above. And, of course, we can't set it higher than the top of accessible RAM.

Now let's proceed to our specific task. Let's say we want to make our SBC (Single-Board Computer) output a «Hello World». How can we do that? Let's confess that outputting (drawing) strings on a display is a little complicated task for a bare-metal beginner. So we'll output our «Hello World» via the most common debug interface all over the world — UART. UART functionality is provided by a particular hardware-block. So we need to start clocking that hardware-block, set its pads and configure it.

First, to enable a clock for UART we have to know which clock exactly we need to enable on the exact SoC. As we mentioned earlier, every clock is derived from a particular PLL. Hence, we need to find out what PLL provides the clock we need. All this information is presented in a datasheet and is rigidly tied to a certain SoC. The second part is to set AF — to configure pads for UART — its TX and RX. Here we need to turn on PINMUX (PLL, clock) and configure pads we need to functions we need. And the last part — configuration is done by writing values to a memory location mapped to an address of hardware-block — PINMUX (pads) and UART (baud rate, parity, etc.) in our case. What configuration must be written is described in a datasheet and, again, is rigidly tied to a certain SoC.

After writing this code we have to compile, link and strip the resulting file to raw machine code, make boot-image by forming out BROM header and adding it to machine code we've got on the previous stage. Then put our boot-image to specific place on a specific media and power on SBC.

Theory ends at this point. In the next part, we'll choose specific SoC, gather information needed to design the «Hello World» for the chosen SoC. With this information we will form out the plan of steps for third part.

07/04/2020

Размышления (1): о C, C++, связи между ними, переоценка ООП и последствия этого

Если вы являетесь фанатом C++ и/или ООП, лучше пропустите эту статью, потому что здесь я буду критически и аргументировано высказываться (в основном в сторону ООП, C++ и Б. Страуструпа).

Процедурное программирование и объектно-ориентированное

Для начала предлагаю рассмотреть, чем отличаются подходы процедурного программирования и объектно-ориентированного. В парадигме процедурного программирования на первое место ставится функционал, а в объектно-ориентированном — информация, свойства описываемого объекта из предметной области. Информация (описания объектов, параметры, переменные) в процедурном языке вторичны и описываются отдельно от функций (как правило, в виде т.н. «структур» или вовсе внешних по отношению к функции переменных) и передаются в функции параметрами. В объектах, наоборот — информация (свойства объекта) объединяется в описании класса, а его функционал, оперирующий этими данными и обрабатывающий их, описывается и реализуется внутри класса. Поэтому в процедурных языках основной единицей является модуль с реализованным функционалом, который, в свою очередь, состоит из процедур. В объектно-ориентированных языках основной единицей является класс, который состоит из его свойств и, включённых в него функций. То есть, в процедурном языке функция работает с переданными ей данными, в объектно-ориентированном — класс содержит в себе все свои свойства и встроенные функции, постоянно имеющие доступ к этим свойствам.
На самом деле всё ООП построено на процедурах, которым, при вызове, неявно передаётся ссылка на структуру класса. Иной реализации быть не может, поэтому объектно-ориентированные языки являются лишь «парадигмой», идеей или подходом, а не технологией (в системном смысле).

Области применения и история ООП и С++

Теперь рассмотрим области применения обоих подходов. Изначально разработка всего ПО была системного уровня — управление устройством приёма перфокарт, жёстким диском, простейшим принтером, примитивные базы данных, несложные файловые системы. Всё это писалось на ассемблере — императивном языке. Поэтому всё началось именно с парадигмы программирования, максимально приближённого к технике. Затем языки программирования развились до таких как C, Pascal, BASIC — процедурные языки. Количество объектов схожих типов, обрабатываемых одной программой, возрастало. Появлялись стандарты интерфейсов — для жёстких дисков, принтеров, дисководов, самих устройств одного типа к компьютеру стали подключать больше. (Обратите внимание на то, что в описании выше я умышленно перечислил все объекты, с которыми работали разработчики в единственном числе, а здесь — во множественном). Появилась многозадачность, и что немаловажно — начали появляться графические интерфейсы — сущности, состоящие из множества объектов с повторяющимися характеристиками (свойствами) со схожим поведением и с некоторыми разнящимися свойствами (например, название и размеры кнопки или окна).

На самом деле интерфейсы были не только графическими, например, был Borland TurboVision — библиотека для создания программ с пользовательским интерфейсом на псевдографике (TUI — Text User Interface). Реализована эта библиотека была на Borland Pascal (Pascal c ООП) и Borland C++.

С конца семидесятых и все восьмидесятые года Бьёрн Страуструп трудился над своим детищем — языком ООП (в том числе и) высокого уровня, в последствии получившим название C++. Долгое время язык назывался «C with Classes» («C с классами») и автор не предполагал выпускать его как публичный продукт. Когда язык C++ начал окончательно формироваться, Страуструп принял решение не отходить от Plain C, и построил C++ на основе последней.
Некоторое время я не понимал, почему человек разработал язык, значительно отличающийся от одного из существующих на тот момент, схожим с ним и даже назвал его похоже («++» означает «шаг вперёд по сравнению с C»). Что, на мой взгляд вносило путаницу (которая никуда не делась). Я даже думал (по аналогии с тем, как, по одной из версий, сделали с JavaScript — использовали Java в названии, чтобы воспользоваться славой молодой и быстро набирающей тогда популярность Java), что Страуструп решил воспользоваться хорошим и сильным имиджем Plain C для того чтобы популяризировать свой язык. Ведь C++ — не первый, и на тот момент, не единственный ООП-язык.

Но теперь, рассмотрев историю, я понял (или выдвинул свою теорию), что C++ стал представляться надстройкой над Plain С, вместо того чтобы выбрать другое название и синтаксис, не для использования славы последней. C++ сохранил совместимость с Plain С для того чтобы хорошо работать с ней в паре. И дело не только в том, что мы можем линковать объектные файлы с обоих языков и легко импортировать функции и даже классы (это можно делать с любыми языками, из кода на которых можно получить бинарно-совместимые объектные файлы). Дело в том, что мы можем использовать исходный код на Plain C в проектах на C++. То есть Plain C и C++ считаются родственниками (во всяком случае в мире UNIX/POSIX) вполне оправдано. Огромная кодовая база, наработанная всем сообществом POSIX-разработчиков, после выхода C++, при необходимости (при пересмотре подхода от процедурного к ООП) могла быть использована. Я понял, что это была не попытка использовать славу Plain C, а обеспечение возможности сохранить огромное количество наработок огромного количества инженеров. Это следствие того, что необходимость управления жёстким диском и принтером, перешла в необходимость управления жёсткими дисками и принтерами, то есть логичное отражение ситуации в мире компьютерной техники на средства разработки.

Рынок профессионалов

Выше я писал о путанице, которая «никуда не делась». Путаница заключается в том, что многие рассматривают C++ не как ООП-надстройку над Plain C, а как «улучшенную» Plain C. Автоматически подразумевая, что Plain C неполноценна и рассматривая её, как «язык предыдущей версии» или поколения. Отсюда происходят много попыток использовать C++ везде, где только получается. А получается далеко не всегда хорошо (об этом ниже). Это имеет последствия и на рынке труда — часто разработчикам на C++ предлагают ощутимо большие зарплаты, чем на Plain-C-вакансиях. А это, в свою очередь, приводит к тому, что новички (студенты и все, кто хотят начать карьеру разработчика) выбирают для изучения C++ (и ООП, соответственно). Становится больше C++-разработчиков, что приводит к ещё большему увеличению соответствующих вакансий. Так этот порочный круг замыкается и превращается в «эпидемию ООП». Отчасти, более высока ставка для ООП-разработчиков оправдана тем, что C++ изначально несколько сложнее изучить — с каждым новым стандартом этот язык усложняется, к требованиям C++-разработчика часто добавляют (и, кстати, так же часто применяемые без надобности) STL, boost и прочие «надбавки». В последние годы к требованиям на C++ ещё добавляется Qt. А Plain C, кажется проще — даже книга от её авторов такая маленькая — за что здесь платить? Разработчику на Plain C нужно платить не за знание «высоких материй» (за которые платят ООП-разработчикам), а за знание и навыки работы с системами — железо (процессоры, контроллеры, шины передачи данных, стандарты, аппаратные протоколы, сети), инструменты (анализаторы, осциллограф, генератор, тестер, иногда даже паяльник), системные API (POSIX, kernel space) и, что самое важное, за понимание того, как всё это связано. Но, как правило, даже когда в компании ищут системного разработчика с пониманием того, что он должен знать, это часто не учитывается предлагаемом окладе. Перед тем как перейти к основной проблеме рынка труда, опишу разницу между информационными технологиями (ИТ) и вычислительной техникой (ВТ). ИТ это Web-технологии, базы данных, обработка информации, desktop- и мобильные приложения, ПО для серверов, графические приложения, профессиональное ПО — текстовые, графические редакторы, пакеты для расчётов физики, IDE для разработчиков и прочее «прикладное» ПО. ВТ — это всё, на чём строятся все ИТ (и вышеперечисленное, но не ограничиваясь им, ПО) от больших машин до встраиваемой техники. Но вся ВТ так же должна быть запрограммирована — процессор сам себя и все свои ядра не включит, Ethernet-пакеты сами себя не отправят и пиксели сами себя не нарисуют на экране. Теперь к проблеме рынка профессионалов. Я не считаю наличие большого количества ООП-разработчиков — специалистов в ИТ, на рынке проблемой. Я считаю проблемой те последствия, которые мы имеем, а именно — то, что перекос в сторону ООП-разработчиков, подогреваемый более высокими вознаграждениями, приводит к дефициту специалистов в вычислительной технике.

Эпидемия ООП или примеры абсурда

Возьмём пример — разработчикам нужно подключить к плате экран. В беседе команда, состоящая из ООП-разработчиков, мыслит следующим образом: «Экран это тип устройства вывода, а устройство вывода это класс устройства. Сделаем класс экран, унаследованный от класса устройство вывода, который, в свою очередь, унаследован от класса устройство». Сделали один класс, от него унаследовали другой и так ещё несколько раз, а потом реализовали Singleton последнего (в реальной жизни экран, в виде семисегментного индикатора, на передней панели маршрутизатора подключён один). Все вышестоящие классы оказались невостребованными, а время потраченное на эту работу лишь прибавило излишнего кода, выполняемого ALU, что, в свою очередь привело к снижению отклика всего устройства, перерасходу энергии и, возможно, даже к замене SoC'а на более мощный и дорогой.

Ещё один пример. В Java (считается самым «чистым» ООП-языком на данный момент) всё является классом. Для того, чтобы впихнуть ООП-парадигму в реальную жизнь, пришлось придумать статическую функцию main () в открытом (public) главном классе, имя которого должно совпадать с именем файла. Статическая функция в Java — это функция, которую можно вызывать без создания экземпляра класса. Мне нравится Java, но то, как пришлось «извернуться» её разработчикам, говорит о том, насколько ООП далёк от вычислительной техники.

На системном уровне и на встраиваемой технике важно не отвлекаться на ООП-парадигму и не «витать в облаках», а работать как можно ближе к системе или к железу. А ООП-языки (даже C++) сильно «оборачивают» POSIX и аппаратную составляющую машины. Пониманию и изучению POSIX'а и вычислительной техники помогает Plain C, Assembler, так как они максимально близки к вычислительной технике. Переоценка ООП-подхода приводит к тому, что люди учатся ИТ, игнорируя ВТ, которая является основой всей современной техники. И последствием этого становится появление целого пласта профессионалов, вовсе не понимающих ВТ. А людей, понимающих ВТ, становится всё меньше. И в этом виноват не Б.Страуструп, не Э.Шмидт и не Б.Эккель, а весь рынок и те, кто его сформировал и продолжает формировать таким.

Итог

Везде, где можно обойтись без ООП, нужно обходиться без ООП. Таких областей, возможно меньше, чем подходящих под применение ООП, но они есть и их немало. Моё мнение — это весь системный уровень, вся встраиваемая техника. Многие приложения так же могут быть успешно реализованы без ООП.
ООП нужно применять только хорошо подумав, и решив подходит ли ситуация под основной критерий для внедрения ООП — много объектов с похожим поведением и с необходимостью настраивать небольшую часть свойств и достаточно сложные связи между сущностями предметной области. И если у вас Web-сервер, поддерживающий много соединений — даже это ещё не обязательно повод для применения ООП.

Всё это полезно для тех, кто способен (с точки зрения опыта и техники) и может (с точки зрения организационной) принимать такие решения. Тем, кто либо в силу опыта, либо в силу организационных аспектов не может принимать такие решения, пока остаётся учиться и присматриваться.

P.S.

Здесь так же хочу добавить пояснение. Не всем и не всегда понятно что такое ANSI C и Plain C и когда стоит употреблять тот или иной термин. ANSI C — это стандарт языка, Plain C — это термин, обозначающий концепцию, парадигму, процедурного подхода на конкретном языке — C. Когда упоминают ANSI C, имеется в виду контекст стандарта (как правило его жёсткие ограничения), когда говорят Plain C, акцент делается именно на парадигме программирования (процедурном) — как противопоставление C++. Любая ANSI C есть Plain C, но не любая Plain C есть ANSI C.

До 1988 года Plain C-код на разных компиляторах мог не собраться. Выражение ANSI C было популярным после выхода стандарта, в 1988 году, когда компиляторы стали «держаться ближе» к этому стандарту. Тогда было актуально, например, на собеседовании говорить, что вы работаете на ANSI C. Сейчас большинство компиляторов близки к стандарту, поэтому в основном, в речи уместно использовать термин Plain C (или просто «Си»). Так как во-первых, скорее всего, вы имеете в виду именно парадигму программирования, во-вторых, вы вряд ли знаете весь стандарт наизусть чтобы похвастаться знанием ANSI C (даже GCC реализует ANSI C с некоторым дополнениями).

P.P.S.

Да будет всему своё место и всего необходимого достаточно.

06/04/2020

Web (2): Watcher.js — компонент для просмотра log-файлов

Задача

Иногда, чаще при разработке встраиваемых систем, возникает необходимость просматривать журнальный файл (лог) какого-либо приложения или демона, работающего на целевой системе или на «железке». Каждый раз ходить на устройство по SSH/Telnet/UART'у бывает не очень удобно. Особенно, когда система уже работает и имеются средства удалённой загрузки, допустим, конфигураций, прошивок или самого ПО, на устройство через систему управления (которая так же может быть Web-приложением).
Можно было бы просто запрашивать интересующий файл через Web-сервер. Но тут есть два нюанса. Первое — Web-сервер, в целях безопасности, не отдаст клиенту файл, расположенный вне DocumentRoot, по простому HTTP-запросу. Второе — даже если настроить сервер так, чтобы он отдавал файл (можно разрешить доступ к каталогу с логами или сделать ссылки на интересующие файлы в DocumentRoot), не будет автоматического обновления дописываемых в лог строк — придётся каждый раз обновлять страницу и ждать пока файл загрузится заново и проматывать вниз. Здесь казалось бы, можно просто написать CGI, которая будет делать

tail -f /path/to/file

но на Web'е это работать не будет, потому что сервер отдаёт ответ от CGI только по завершению процесса, а tail -f не завершится никогда. Страница на стороне клиента зависнет в вечном состоянии загрузки.

Решение

Именно эти две задачи и решает мой компонент (Web-приложение), который я представляю в этой статье — компонент для просмотра текстовых файлов через Web-интерфейс, Watcher.

Watcher позволяет просматривать любые текстовые файлы на файловой системе, к которым у CGI-скриптов, запускаемых Web-сервером, есть доступ на чтение. Я на своей системе смог просмотреть даже /etc/fstab и /proc/cpuinfo. (А вот /var/log/messages без изменения прав доступа не получилось загрузить, он оказался под запретом даже для чтения.) Строки грузятся кусками и один раз, то есть вы можете читать информацию, не дожидаясь её полной прогрузки и без повторных загрузок файла.

Алгоритм, по которому работает Watcher

Сперва запрашивается размер файла, выраженный в количестве строк. На этом же этапе происходит определение ошибки доступа или отсутствия указанного файла. Если обнаружена ошибка, выводится сообщение и дальнейшая работа прекращается. Если ошибки не произошло, Watcher кусками подгружает строки из файла от нулевой до того размера, который был получен в начале. Затем запускается циклический таймер, который опрашивает изменение размера файла (в строках). Если файл увеличился, запрашивается (кусками) новое содержимое и отображается. Данные (количество строк и прогресс) получаемые на первом этапе отображаются в левом поле (Fetch). Данные, подгружаемые в последствии, отображаются в центральном поле (Watch).

Управление Watcher'ом

Имя файла задаётся в параметре адресной строки (?File=). То есть, вы вводите адрес (например, 127.0.0.1/Watcher.html?File=../log) и нажимаете enter. Всё, Watcher начинает работать по загрузке страницы. Если вам удобно чтобы при подгрузке новых строк, компонент автоматически прокручивался на них, есть переключатель Auto Scroll.

Иногда бывает нужно сбросить лог, очистить его содержимое. Для этого есть кнопка Clear Log. Эта функция очищает файл (при условии соответствующих прав доступа) на устройстве и перезапускает Watcher.

Ограничения и пояснения

Я долго соображал, что последняя, пустая строка, которую мы видим в текстовых редакторах, на самом деле не существует — символ новой строки добавляется в конце последней строки — той, которая нам кажется предыдущей. То есть, символ новой строки является частью последней строки, а не отдельной строкой. Поэтому, когда вы видите в редакторе или в выводе cat пустую строку снизу — это особенности вывода. Эта строка не выводится в Watcher как отдельная, пустая. Отсюда же следствие — если вы допишете строку в конец файла без символа новой строки, она не подхватится Watcher'ом.

В следствие особенностей алгоритма (работает только со строками и по строкам), который я разработал в этом компоненте есть некоторые ограничения:

нельзя удалять строки из файла — Watcher не отреагирует на это, более того — алгоритм вовсе собьётся
не имеет смысла изменять строки в файле — однажды загрузив содержимое файла, Watcher не следит за изменениями этого содержимого
CGI Watcher'а написан на Python 2.x — на Python 3.x пока не работает
у меня в проекте своя структура (расположение файлов) — вам придётся настроить свой httpd так, чтобы работал Python-скрипт из моего каталога (CGI) или переместить его в каталог с вашими CGI

Чтобы воспользоваться:

git clone https://gitlab.com/daftsoft/watcher.js.git