05/01/2022

System concepts (1): BSS — long forgotten but ever-present

In this post we'll take a look at one interesting concept of modern operating systems — BSS. Maybe some of you have not heard of it at all; some of you may think of it as of some sort of ancient thing and suppose it is not used these days. In first part of this post we will examine the purpose it was ever invented for. In second part we'll show how it is used these days ubiquitously even if you don't know about it.

Historically, BSS stands for "Block Started by Symbol" or "Block Starting Symbol". But we will not deepen in history because these days none of that acronym is meaningful.

Technically, BSS is a section of data in (object or executable) file. If it is a section, you may suppose we can declare it in the assembly language with .section directive. Well, that's right. Let's do it.

        .section .bss
	.lcomm var, 1 #1000

	.global main
	.text
    main:
	xorq %rax, %rax
        retq

By the way, modern assembly translators do not require keyword .section, .bss is enough. 

Let's explain what is done here. We declared .bss section with a variable named var in it with a parameter 1 (which may look like a value, but it is not). Compile and look at the resulting a.out, it's size in my case (Linux/GCC) is 16496 bytes. Now we change the parameter 1 to 1000, for example. Compile and look at size of a.out — it remains the same. "What kind of magic is that?!" It's "white" magic and now it's time to explain the whole thing. BSS was invented to save space on disk (or other storage or network bandwidth). And, yes, it's really that "ancient" invention — it originates to the 1950s. You can use it in cases when you don't need to specify values of storage area (variable), for example — to declare buffers which you will write to later, at runtime. You see that variables in .bss are declared in a different manner. It has no .globl directive — it uses .lcomm/.comm instead to specify it's visibility. .lcomm stands for local module visibility, while .comm — for global (some sort of .globl directive). The parameter (1, 1000 in our case or whatever you want) is the size of the buffer. How it works? Linker writes symbol with variable name and address and count of bytes in resulting module. At runtime .bss variables are expanded in memory to specified size and (while it's not standard, usually) initialized with zeroes.

The opposite way to declare zero-filled array is to use .fill directive on regular variable in .data section — in this case all zeroes will be written to resulting module increasing it's size. I'll omit this example here, but you may check it by yourself:

        .data
    big_var:
	.fill 1000000

At this moment you may think — "This is a good idea. But could I use it to optimize my high level language programs with this knowledge?". I've had same thoughts and have checked it. The answer is "yes", moreover — you already often do this. Here starts the second part of our little research.

The short receipt is — just declare your variables as global arrays and initialize them as { 0 }. Let's prove it:

    char lBSSVar[1] = { 0 };
    int main ()
    {
        return 0;
    }

Compile and check the resulting file size. For example, on my machine a.out is 16496 bytes (same size as a.out I've got from assembly language code). Now change the size of lBSSVar to 1000, recompile and see the size of a.out is not changed.

Let's see if it is BSS or something else by examining assembly code we get of our C-code:

	.globl	lBSSVar
	.bss
	.align 32
	.type	lBSSVar, @object
	.size	lBSSVar, 1000
    lBSSVar:
	.zero	1000

We see in this list (I've posted here the part that we are interested in only) that compiler made .bss section from our C-code. Syntax differs from my raw assembly example but technical idea is the same.

P.S.
The idea to save space in object files I've demonstrated in this post is really old. But as we see in last list syntax may differ. For example, clang uses .zerofill directive to describe uninitialized buffers. Thus different compilers may use different syntax. So, in my my opinion, if you code your application in raw assembly language you can use syntax I've shown in first part of this post — it is a short way to use BSS idea. Second part of this post lets you know how to declare buffers you don't need to initialize in high-level languages saving some extra space on storage devices and a little time loading it.

No comments:

Post a Comment