Frequently asked questions

Why is inline assembly crashing the compiler or behaving strangely?[edit | edit source]

If you've tried to use inline assembly in C and have seen strange behavior, you're in good company. GCC's (and thus Clang's) inline assembly feature has a lot of rough edges, and it's quite a bit more difficult to use safely than you might think.

The reason for this is, of course, performance. Making the inline assembly feature simpler to reason about would also require the compiler to do less optimization around it. The rules for how it works aren't too terribly complicated, though.

Rule 1: Declare your clobbers.[edit | edit source]

Take a look at the following example:

char foo(char value) {
  asm ("some compiler-unknown string of inline assembly goes here");
  return value;
}

The calling convention has the value value in the A register on entry, and the return value needs to be in A on exit. The compiler would like to emit this as:

foo:
  some compiler-unknown string of inline assembly goes here
  rts

But, is this safe? It depends entirely on the contents of the inline assembly, but that could JSR or BRK, doing who knows what.

There are two assumptions the compiler could make: that A is clobbered, or that A is preserved. Assuming it's preserved is faster, so naturally, this is the assumption that GCC makes. The burden falls on the author to list all clobbered registers except for the flags N and Z. (These aren't explicitly tracked by the compiler, since they're clobbered by so many instructions.)

So, if you write this instead as the following, then the compiler will save and restore A around the inline assembly:

char foo(char value) {
  asm ("some compiler-unknown string of inline assembly goes here":::"a");
  return value;
}

foo:
  sta mos8(__rc2)
  some compiler-unknown string of inline assembly goes here
  lda mos8(__rc2)
  rts

Rule 2: Inputs, outputs, and `volatile`.[edit | edit source]

Here's a related example that also might not work:

void foo(char value) {
  asm ("sta 0x1234"); // bad: missing `volatile` and input operand
}

The trouble here is that the compiler is under no obligation to keep the variable value in the A register. Actually, the compiler isn't under any obligation to keep the inline assembly around at all. GCC inline assembly statements are considered to consume inputs to produce outputs; if the outputs are unused (or there aren't any) the compiler is free to delete the whole inline assembly statement as an optimization. I haven't personally observed this, but the compiler does currently tend to move inline assembly statements outside of loops, since they "don't do anything that involves the loop".

In this case, we want to consume the variable value as an input, and produce the effect of storing to 0x1234. This isn't exactly an output, but we can declare that the inline assembly statement has "other" side-effects by declaring it volatile (much like an I/O register is declared volatile).

The correct version of this snippet is:

void foo(char value) {
  asm volatile ("st%0 0x1234" : /* no output */ : /* input -> */ "R"(value));
}

The R constraint on the value input specification says "put this in A, X, or Y". You can insert the register used into the text of the assembly to become sta, stx, or sty, depending on which the compiler ends up picking. It should trivially pick sta in this case, but this gives the compiler flexibility as the code around the inline assembly changes. For an overview of the GNU assembler syntax, including input and output, see this link.

Rule 3: Consider using module-level asm or an assembly file.[edit | edit source]

As you may have gathered from the rather extensive list of concerns above, inline assembly in GCC is really designed for the efficient insertion of assembly snippets to be used as part of the expressions and calculations performed by a larger C algorithm. If you find yourself writing all or nearly all of a routine in assembly, then inline assembly probably isn't the right tool for the job.

Instead, consider writing the entire routine in assembly, and having the routine use the C calling convention. Then, it can be declared in a C header and called natively from other C routines.

There are broadly two ways to do this: module-level asm or dedicated assembly files.

To use module-level assembly, just place the asm statement at the top level of a file and declare the function as you would in the assembler:

asm (
  ".text\n"
  ".global foo\n"
  "foo:\n"
  "  sta 0x5678\n"
  "  rts\n"
);

void foo(char c);

void bar(void) {
  foo(c);
}

Alternatively, you can create a foo.s file, assemble it, and add it to the link. This allows you to call the function as if it was written in C, and keeps a clean interface (i.e, the C calling convention) between C and assembly code.

The downside of these two routes is that you lose the ability for the compiler to inline the function into its callers; fully-assembly functions are opaque in a way that inline assembly isn't. But, that's the tradeoff; you can give the compiler a degree of control over the scheduling and semantics of your assembly semantics, or you can not; the choice is yours.

Rule 4. Suspect your register constraints.[edit | edit source]

Register constraints must be used exactly as documented. Inadvertently capitalizing them incorrectly, or referring to nonexistent registers, can crash the compiler. Yes, we're aware that compilers should never crash, and we're waiting on some upstream changes to fix this permanently.

Why is the compiler removing my infinite loops?[edit | edit source]

Here's a really good article on this: Non-Termination Considered Harmful in C and C++

The short version is that in C++, infinite loops are undefined behavior, and the compiler is free to assume that they cannot happen. This allows the compiler to emit faster code, so it will typcially do this whenever it can.

In C, the situation is a bit more complicated; it can assume that any nontrivial loops terminate, so long as they don't contain certain things.

Generally, if you want an infinite loop, you have to put something inside it that the C and C++ standard doesn't allow the compiler to elide. The easiest way to do this is typically asm volatile ("");, this tells the compiler that an inline assembly block with uncertain side effects should be inserted, which should keep the loop around. The compiler doesn't do deep introspection into ASM fragments, so it can't prove that it's safe to elide the loop, so it stays.

This requires the inline-assembly extension; to do this in standard C/C++, just insert a volatile load or store into the loop. This will similarly force the compiler to keep the loop.

The reason for all this is that it allows compilers to assume that code underneath loops will eventually be reached, which allows a variety of optimizations to occur. Modern C/C++ compilers like Clang operate by tearing down the code as written and rewriting it, bit by bit, into something more efficient. The only real constraints on this process are the letter of the C standard; undefined behavior exists to permit generating better code; it just creates unfortunate foot-guns like this one.

Why is the compiler removing my memory accesses?[edit | edit source]

This is very similar to the above; if a memory access absolutely has to happen at a particular point in the program, it has to be done through a volatile object. Otherwise, the compiler is free to reason its heart out about what purpose that memory access serves, and if one isn't visible to the compiler, the access may be removed. This is usually very desirable; spurious accesses crop up all the time. For example, if you're incrementing a 16-bit counter variable, and some particular path through the program only ever looks at the low byte, the compiler might skip the high part of the increment. But, you wouldn't want this to happen if this variable was referenced by an OS routine! So, volatile is the blessed way to tell the compiler: "this needs to happen now!".

Why isn't the linker placing my sections?[edit | edit source]

There are a two main reasons this can occur; the issue could be due to either or both.

First, because the section wasn't marked "allocatable", which unfortunately isn't the default on linkers compatible with GNU ld for sections without a standard name (e.g., .data). Allocatable just means that the section should take up space in the final binary, as opposed to things that are merely there for linker or CLI utility use, like symbol tables. The syntax to add this flag is: .section name,"a"

Second, because no symbols defined in the section were referenced, the linker may garbage-collect away the section. The linker has a sense of which sections must intrinsically be present (the logic is a bit complex, but it usually does a good job). Any sections that can't be reached from those known roots are removed, to save space. This is one of the features that allows uncalled functions to be stripped out of the binary at link time. To solve this, you can add the "retain" flag to the section declaration: .section name,"R". Another way to do this is to add KEEP to the linker script when assigning the section: output_section { KEEP(input_section) }

See the GNU assembler manual for the section directive and GNU ld manual for KEEP.

Does llvm-mos support 24-bit integers?[edit | edit source]

Unofficially, yes, via the C extension _BitInt(n). You can say _BitInt(24) x; or unsigned _BitInt(24) x; and the results will behave more or less how one would expect.

We can't officially support an int24_t without also including a special printf and scanf syntax in the SDK, since the standard mandates that e.g. the macro PRId24 would resolve to this syntax. This is something we can do, but the priority is relatively low. It's tracked as an issue against the SDK.