Frequently asked questions

Why is the compiler removing my infinite loops?
Here's a really good article on this: Non-Termination Considered Harmful in C and C++

The short version is that in C++, infinite loops are undefined behavior, and the compiler is free to assume that they cannot happen. This allows the compiler to emit faster code, so it will typcially do this whenever it can.

In C, the situation is a bit more complicated; it can assume that any nontrivial loops terminate, so long as they don't contain certain things.

Generally, if you want an infinite loop, you have to put something inside it that the C and C++ standard doesn't allow the compiler to elide. The easiest way to do this is typically, this tells the compiler that an inline assembly block with uncertain side effects should be inserted, which should keep the loop around. The compiler doesn't do deep introspection into ASM fragments, so it can't prove that it's safe to elide the loop, so it stays.

This requires the inline-assembly extension; to do this in standard C/C++, just insert a volatile load or store into the loop. This will similarly force the compiler to keep the loop.

The reason for all this is that it allows compilers to assume that code underneath loops will eventually be reached, which allows a variety of optimizations to occur. Modern C/C++ compilers like Clang operate by tearing down the code as written and rewriting it, bit by bit, into something more efficient. The only real constraints on this process are the letter of the C standard; undefined behavior exists to permit generating better code; it just creates unfortunate foot-guns like this one.

Why is the compiler removing my memory accesses?
This is very similar to the above; if a memory access absolutely has to happen at a particular point in the program, it has to be done through a volatile object. Otherwise, the compiler is free to reason its heart out about what purpose that memory access serves, and if one isn't visible to the compiler, the access may be removed. This is usually very desirable; spurious accesses crop up all the time. For example, if you're incrementing a 16-bit counter variable, and some particular path through the program only ever looks at the low byte, the compiler might skip the high part of the increment. But, you wouldn't want this to happen if this variable was referenced by an OS routine! So, volatile is the blessed way to tell the compiler: "this needs to happen now!".

Why isn't the linker placing my sections?
There are a two main reasons this can occur; the issue could be due to either or both.

First, because the section wasn't marked "allocatable", which unfortunately isn't the default on linkers compatible with GNU ld for sections without a standard name (e.g., ). Allocatable just means that the section should take up space in the final binary, as opposed to things that are merely there for linker or CLI utility use, like symbol tables. The syntax to add this flag is:

Second, because no symbols defined in the section were referenced, the linker may garbage-collect away the section. The linker has a sense of which sections must intrinsically be present (the logic is a bit complex, but it usually does a good job). Any sections that can't be reached from those known roots are removed, to save space. This is one of the features that allows uncalled functions to be stripped out of the binary at link time. To solve this, you can add the "retain" flag to the section declaration:. Another way to do this is to add  to the linker script when assigning the section:

See the GNU assembler manual for the section directive and GNU ld manual for KEEP.

Why is inline assembly behaving so strangely?
If you've tried to use inline assembly in C and have seen strange behavior, you're in good company. GCC's (and thus Clang's) inline assembly feature has a lot of rough edges, and it's quite a bit more difficult to use safely than you might think.

The reason for this is, of course, performance. Making the inline assembly feature simpler to reason about would also require the compiler to do less optimization around it. The rules for how it works aren't too terribly complicated, though.

Rule 1: Declare your clobbers.
Take a look at the following example: The calling convention has the value  in the   register on entry, and the return value needs to be in   on exit. The compiler would like to emit this as: But, is this safe? It depends entirely on the contents of the inline assembly, but that could JSR or BRK, doing who knows what.

There are two assumptions the compiler could make: that  is clobbered, or that   is preserved. Assuming it's preserved is faster, so naturally, this is the assumption that GCC makes. The burden falls on the author to list all clobbered registers except for the flags  and. (These aren't explicitly tracked by the compiler, since they're clobbered by so many instructions.)

So, if you write this instead as the following, then the compiler will save and restore A around the inline assembly:

Rule 2: Inputs, outputs, and.
Here's a related example that also might not work: The trouble here is that the compiler is under no obligation to keep the variable  in the   register. Actually, the compiler isn't under any obligation to keep the inline assembly around at all. GCC inline assembly statements are considered to consume inputs to produce outputs; if the outputs are unused (or there aren't any) the compiler is free to delete the whole inline assembly statement as an optimization. I haven't personally observed this, but the compiler does currently tend to move inline assembly statements outside of loops, since they "don't do anything that involves the loop".

In this case, we want to consume the variable  as an input, and produce the effect of storing to 0x1234. This isn't exactly an output, but we can declare that the inline assembly statement has "other" side-effects by declaring it  (much like an I/O register is declared volatile).

The correct version of this snippet is: The  constraint on the   input specification says "put this in A, X, or Y". You can insert the register used into the text of the assembly to become,  , or  , depending on which the compiler ends up picking. It should trivially pick  in this case, but this gives the compiler flexibility as the code around the inline assembly changes. For an overview of the GNU assembler syntax, including input and output, see this link.

Rule 3: Consider using module-level asm or an assembly file.
As you may have gathered from the rather extensive list of concerns above, inline assembly in GCC is really designed for the efficient insertion of assembly snippets to be used as part of the expressions and calculations performed by a larger C algorithm. If you find yourself writing all or nearly all of a routine in assembly, then inline assembly probably isn't the right tool for the job.

Instead, consider writing the entire routine in assembly, and having the routine use the C calling convention. Then, it can be declared in a C header and called natively from other C routines.

There are broadly two ways to do this: module-level asm or dedicated assembly files.

To use module-level assembly, just place the  statement at the top level of a file and declare the function as you would in the assembler: Alternatively, you can create a   file, assemble it, and add it to the link. This allows you to call the function as if it was written in C, and keeps a clean interface (i.e, the C calling convention) between C and assembly code.

The downside of these two routes is that you lose the ability for the compiler to inline the function into its callers; fully-assembly functions are opaque in a way that inline assembly isn't. But, that's the tradeoff; you can give the compiler a degree of control over the scheduling and semantics of your assembly semantics, or you can not; the choice is yours.