Preface
It's hard to find a person who would argue that it's impossible to count how often we want to change contents of a string declared as:
char* String = "Hello, World!";
Despite the fact that this operation looks absolutely ordinary and we expect a very straight and simple result from it, while trying to do something like:
String[1] = 'a';
we get Segmentation fault or SIGSEGV. And, usually, whole development ends at this point.
Don't even dare to do this — your CPU will catch fire immediately and all your unsaved data will be lost!
Segmentation fault occurs when a process attempts to write (or read, see below) to an area of memory that is inaccessible to it. What is that memory area we can't write to? In our example, it's the area in the data segment marked as read-only — RO DATA. Let's look at the output of a program that declares a char* String. From a code like this:
int main ()
{
char* lString = "Hello, World!";
}
we get:
.section .rodata
.LC0:
.string "Hello, World!"
Of course we can replace .section .rodata with .section .data. But we won't suggest intervening the building process of our code every time in, actually, not most handy stage — translation.
Solution
Let's have a look at POSIX function — mprotect(). This function changes the access conditions to a memory area.
There is a misstep or inaccuracy associated with this function. In documentation — as in man as on the Internet, it's claimed that mprotect() marks memory starting with address of a page and ending with address of that page plus length in bytes minus 1. I had to do a little search and research to clarify that the length is specified in pages, not bytes. That looks more clear because MMU operates with pages, not with bytes. And in the following examples I've proved this — specifying 1 as a length parameter I've got marked enough memory for all of our variables. The conclusion is — mprotect() assumes length is in count of pages.
The function takes address, length and flag as parameters. In our case, we are most interested in address and flag. We need PROT_WRITE flag — it's simple. But the things about address is a little more complicated. As MMU works with pages, the starting address we want to modify access to, should be aligned to page size. To get such address we'll calculate the closest (from the lower side) address of variable we're interested in. We'll do this in the following manner. First, we'll request the size of the page. Second, we'll drop all the least significant bits of the variable address that match the page size. Simply put, we zero the variable address bits on the least significant side to the bits of the page size.
For example: the address of lString is 555555556004h, we get the page size from the system, in my case (most probably you'll get the same) it is 4096. Subtract 1 from the page size, since it's actually addressed from 0 to 4095, and we'll get FFFh. We see that pages of this size are multiples of three halfbytes, or 1.5 bytes. In our example, the address of the page closest to the variable is 555555556000h. To calculate this address, we need to drop the outer 1.5 bytes of the variable's address. By inverting the page size we get a mask for the logical operation — FFFFFFFFFFFFF000h. Casting this to the void* type gives us the full memory address size for any architecture. This leads to the following preprocessing results:
void* lPageBoundary = (void*) ((long) lStr1 & ~(getpagesize () - 1));
For our research we'll set the length of memory we modify access to as one page. The size of a page allows us not only to play with writing to different areas, but also to experiment with boundaries of variables:
mprotect (lPageBoundary, 1, PROT_WRITE);
That's it. From this point on, unprecedented miracles begin. For example, the following program displays «World!» instead of the already annoying and tedious «Segmentation fault»:
#include "string.h"
#include "unistd.h"
#include "stdio.h"
#include "sys/mman.h"
int main ()
{
char *lStr1 = "Hello";
char *lStr2 = " orld!";
void* lPageBoundary = (void*) ((long) lStr1 & ~(getpagesize () - 1));
// Comment next line to turn magic off:
mprotect (lPageBoundary, 1, PROT_WRITE);
lStr2[0] = 'W';
printf ("%s\n", lStr2);
}
But I suggest to go a little further and play with addresses a little more. As you can see, I've declared two variables — lStr1 and lStr2. char* is an asciz or LPSZ (Long Pointer to Zero Terminated String) variable. As we know, string functions determine lengths and terminations of strings based on this zero. Let's check what happens if we replace the terminating zero in lStr1 with a space and output this string using printf(). Here, we'll demonstrate that we can write (modify memory) within the entire memory region access to which was modified by mprotect() function. I propose implementing the experiment of removing the last zero from lStr1 by writing to the address of lStr2 - 1:
lStr2[-1] = ' ';
printf ("%s\n", lStr1);
Since our strings are located in memory one after other, it turns out we've removed the terminating zero from lStr1 and replaced it with a space symbol. Consequently, lStr1 is no longer asciiz/LPSZ, and thus, printf() should «fall thru» lStr1 to the first occurrence of zero which will be the terminating zero of lStr2. Let's check this. And we get the following output:
Hello World!
That's right, printf() reached the first terminating null, which turned out to be the terminating null of string lStr2. We got the output of the «concatenated» string.
In fact, this only looks unusual in C. In assembly language, this kind of data handling is ordinary routine — for example, the length of a string (or ascii/asciz) or array can be calculated by subtracting the address of the variable following it from its own address.
But that's not all. Perhaps at this moment, the thought has crossed your mind: «If this is possible, then maybe I can change the values of variables declared as const?» The answer is yes — this long-held dream can be realised. But there's a some caveat. As we know, the compiler won't let us write to variables declared with the const modifier. An attempt to do:
const char* lStr3 = "hello world!";
lStr3[0] = 'H';
will lead to:
./mprotect.c:18:12: error: assignment of read-only location '*lStr3'
lStr3[0] = 'H';
^
To resolve this situation, we trick the compiler as follows:
*((char*)lStr3 + 0) = 'H';
We obtained the variable's address, added an offset to it, and wrote whatever we wanted to it. Now the code compiles and works. The offset here is zero; it doesn't have any technical meaning. I wrote the offset here to fully explain the notation for accessing variables declared as const. Using this notation, you can address any character in the string... and not just that character, and not just forward, since we can write in boundaries of the entire page we've granted access to.
NB
It seems obvious, but I think I should warn you. We've removed the write protection on a memory area, and you can remove the protection on a larger memory area. This means you can modify memory without any control from the OS, MMU, or compiler, as we've completely eliminated this functionality. You can write to these memory areas, but if you want to maintain proper functionality of your code, you need to be even more careful about the boundaries of the memory areas you modify!
P.S.
You may have noticed that we used the PROT_WRITE access flag without PROT_READ, and we were reading data from these areas. This was because the x86 MMU (this research was made on x86 machine) doesn't have a write mode without reading, so you can use PROT_WRITE | PROT_READ, but this is pointless. If you want to play with access, you can try PROT_NONE. In this case, you won't have any access to the memory page, and even an attempt to read it, for example via printf(), will result in a segmentation fault. This feature could be used in some way in practice, but it's hampered by the fact that we can only mark an entire page, and they only come 4K/2M/4M and 4Gb in size, depending on the architecture and/or operating mode. 4K is quite too much to use memory access mechanisms as some kind of «trap». However, if the program is large enough, we can group flags, the changes to which we want to react in certain conditions, into 4Kb blocks and implement the logic for reacting to a triggered security exception in the signal handler.