Character set

From llvm-mos

Due to limitations in clang, llvm-mos's execution character set is always UTF-8 with respect to the C standard. However, thanks to the magic of C++20 user-defined literals, we provide compile-translation from source C++ UTF-32 literals (i.e. U"") to target character sets.

With the acceptance of Unicode Symbols For Legacy Computing , there are now official mappings between Unicode code points and some 6502 target platform character sets, and we follow these rigorously whenever applicable. If the string cannot be completely mapped to the target character set, the compile error no matching literal operator for call to ... will be produced.

Supported Character Sets
Target Name Syntax Notes
Commodore PET, Commodore 64, Commander X16 Unshifted PETSCII U"HELLO"_u C0 Control codes are passed through uninterpreted.
Shifted PETSCII U"hello"_s
Unshifted Video U"HELLO"_uv C0 Control codes are passed through uninterpreted. Corresponds to the character set's layout in ROM.
Shifted Video U"hello"_sv
Unshifted Reverse Video U"HELLO"_urv C0 Control codes are passed through uninterpreted. Corresponds to the character set's layout in ROM. Selects the inverse character for each Unicode character. Note that some Unicode characters are already "inverses"; the normal character is selected for these.
Shifted Reverse Video U"HELLO"_srv
Commander X16 ISO-8859-15 U"Hello"_i C0 and C1 Control codes are passed through uninterpreted. Corresponds to the "ISO mode".
Atari 8-bit ATASCII U"Hello"_a C0 Control codes are passed through uninterpreted.
International ATASCII U"Hello"_i
ATASCII Video U"Hello"_av C0 Control codes are passed through uninterpreted. Corresponds to the character set's layout in ROM.
ATASCII International Video U"Hello"_iv
ATASCII Reverse Video U"Hello"_arv C0 Control codes are passed through uninterpreted. Corresponds to the character set's layout in ROM. Selects the inverse character for each Unicode character.
ATASCII International Reverse Video U"Hello"_irv