C compiler[edit | edit source]
The compiler is believed to be C99 compatible, but the quality of generated code is not yet at even a v0.1 level.
The LLVM SingleSource end-to-end test cases pass on a simulated 6502. This is true at -O0, -O3, -Os, and -Oz.
The C compiler is based on LLVM's GlobalISel architecture. As of this writing it can pass the LLVM end-to-end test suite in a variety of optimization modes, although very little specific attention was paid to the quality of the generated output. It's still pretty OK, since we benefit greatly from LLVM's high-level optimizations, but there's still lots of low-level optimization opportunities available.
Although changes exist all over the LLVM source base, the backend for MOS architectures mainly exists in llvm/lib/Target/MC/MOS . Using the triple 'mos' will cause clang to use the new MOS backend. By default, this backend will target the 6502 as its default, which should work on all CPUs and NMOS implementations that claim 6502 compatibility.
Assembler[edit | edit source]
The assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time.
The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.
The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named
.zp, then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address. Additionally, if a symbol is placed in a section with a
z section flag enabled, then that symbol is assumed to be located in zero page, with addressing calculated accordingly.
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. See the DollarIsHexPrefix constant in MCAsmInfo.h. The lexer now queries whatever the current MCAsmInfo structure to see whether the target wants the dollar sign to be a hex prefix. So, everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants, if the corresponding MCAsmInfo asks for it. The modern 0x prefix works fine as well.
The assembler understands the names of the 6502 registers, including a, x, y, p, sp, and pc. It understands references to these names to be references to those registers. If your code depends on these names as variable or section names, you can force the assembler to use the prefix of llvm_mos_ on those registers, e.g. llvm_mos_a, llvm_mos_x, etc. To require this prefix on references to those registers, enable the
mos-long-register-names compilation feature. For example, with llvm-mc, use the flag
-mattr=+mos-long-register-names. Printed assembly output uses this naming scheme to avoid conflicts with existing code.
To target MOS family processors, you will need to use a triple of "mos" (try:
-triple mos) as a parameter to any tool.
Linker[edit | edit source]
The ELF file format for objects and executables, has been extended to support 65xx compatible processors. Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.
The linker, lld, can be called with a "-flavor gnu" parameter in order to permit linking of ELF executables.
If you've written an appropriate linker file for your 6502 target, you can season the following overly verbose formula to compile assembly code for your particular target, where %s is the name of your assembly source, %S is the directory of your include files, and %t is the base name of your project:
llvm-mc -triple mos --filetype=obj -o=%t.obj %s llvm-objdump --all-headers --print-imm-hex -D %t.obj llvm-readelf --all %t.obj lld -flavor gnu %t.obj -o %t.elf -L %S your-target.ld llvm-readelf --all %t.elf llvm-objdump --all-headers --print-imm-hex -D %t.elf llvm-objcopy --output-target binary --strip-unneeded %t.elf %t.bin
The llvm-objdump and llvm-readelf programs are not necessary; they're just there to help you debug your own pipeline.
As of this writing, some example linker files and BASIC stubs exist at llvm/test/MC/MOS/Inputs.
ELF[edit | edit source]
Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files.
Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, such as this online ELF viewer, can read and dump basic information about MOS executables.
Deployment[edit | edit source]
If any branch builds and passes the
check-all suite of LLVM tests, it's immediately deployed so you can get fresh bits for your favorite platform. Always grab the main build first. We smoke test on Windows, MacOS, and Ubuntu, but llvm-mos should build on anything that LLVM can build on. You can download fresh binary bits from the Github releases page.
Implementation[edit | edit source]
TableGen has been taught all native 6502 instructions and formats. llvm-mc can assemble all 6502 instructions..
For some examples of what the backend can do as of this writing, see the llvm/test/MC/MOS directory for some functional assembler tests. Building the check-llvm-mc-mos target, confirms just these tests for MOS.