I’ve toyed with the idea of using a
jump with a runtime-computed parameter, to jump into a field of instructions. This would allow random access to the data. For instance, a field of
ret instructions with long operands would have an encoding overhead of 50% (three bytes of code to hold two bytes of data).
A raw byte in a packed Z-string requires at best 5.33 bits, at worst 21.33 bits, and on average 17.47 bits per byte (assuming a rectangular distribution of bytes, and no use of abbreviations) which leads to an average overhead of 118% (worst case 167%). If the data is encoded using only lowercase letters, it is possible to bring the overhead down to 13%, but the numbers have to be converted back to bytes using expensive multiplication instructions. A faster option would be to use the lowercase letters plus six uppercase letters, to get 32 code points with an average overhead of 26%.
To combine efficient bit stuffing with random access, we could use a field of
print_ret instructions. By adjusting the length of the string literals, we have full control over the tradeoff between random access and encoding overhead. We would have to use a constant bit-length encoding, e.g. lowercase letters only. For instance, with 4-byte literals (plus one byte for the instruction), we can encode six lowercase letters for every five bytes. That would give us random access to a table of 28-bit values, corresponding to an overhead of 43%. With 8-byte literals, we get random access to a table of 56-bit values, and an overhead of 28%.