A 32-bit Å-machine

Natrium729 · June 5, 2026, 1:14am

Oh, I didn’t mean to put it in the spec! Just use uncompressed text (the best very probably being UTF-8). And if ever someone want to distribute their aastory file, then they can gzip it themself. Like, is there a point to compress text on the VM side in most cases, nowadays? (For modern machines, at least.)

Off-topic, but having coded a Glulx interpreter in full safe Rust without any issue, I’m a bit curious about what you’re talking about.

Draconis · June 5, 2026, 1:26am

If I remember right, Dannii needed to incorporate a large external library into his Rust interpreter just to normalize dates. It was a while ago, though.

Natrium729 · June 5, 2026, 1:35am

Ah, then that’s Glk, not Glulx, and dates are an optional feature anyway. (I can see how it can be bothersome if your host doesn’t have a built-in function for that, but dates are complicated anyway.)

Draconis · June 5, 2026, 1:39am

Ah, I did misremember. My bad!

But yeah, my takeaway from that situation was “don’t assume a common library will always be available to implement your spec”. The Dialog project tries hard not to require any external libraries whatsoever (the compiler has it own database of casing pairs for Unicode characters!) for maximum portability, which is a pain, but generally workable.

Candy64 · June 5, 2026, 1:53am

They really are and often times overlooked. Even if your tool chain outputs a static 32-bit libc. The static 32-bit libc Y2K moment is only about 12 years away.

Dannii · June 5, 2026, 2:05am

To give a little context, the actual date processing code is small, but the library brings in a huge amount of debug printing code. I’m hoping that will be fixed in the future. Or I could also implement it myself.

Gzip is basically ubiquitous now. I use a 4kb JS library, and other languages would have even smaller options when compiled down to native code. But compressing individual strings wouldn’t be very effective.

jwalrus · June 5, 2026, 7:19am

I feel like it’s worth pointing out here that negative numbers in Dialog aren’t happening without an update to the Å-machine spec, and it’s not realistic to add them to the 16-bit version at this point, so a 32-bit Å-machine probably does help you.

There’s no reason that Å-machine files have to be confined to web play, it’s just that, like @Draconis said, no-one has written a non-C64 offline interpreter that people actually want to use.

This idea worries me a bit, because Dialog has some nice output capabilities that I think haven’t been explored before largely because they don’t mesh with Glk’s assumptions. Would we end up with a substandard Å-machine interpreter just for the sake of making it part of e.g. Gargoyle?

Draconis · June 5, 2026, 3:12pm

It would probably end up like how Dialog currently does output on the Z-machine: handling bold, italic, reverse, monospace, color, and vertical margins, but not the full suite of CSS properties. Depending on the Glk library it could also adjust the text size and horizontal margins.

The big thing it can’t do that the web interpreter can is having an inline status bar that disappears when the next one is printed. But it might be possible to fake that with some windowing tricks depending on the Glk library.

sue · June 5, 2026, 3:45pm

It’s perfectly realistic, given a --signed compiler flag to switch all integers from unsigned to signed (also stashed in the resulting .aastory), and (interpreter supports negative numbers) and (largest integer $) predicates to tell whether you’re in 0 to 16383 land or -8192 to 8191 land. There aren’t really released .aastory files out in the wild at the moment, because there aren’t terps that work directly with them, which makes releasing a 1.2 or 2.0 version of the Å-machine less of a big deal.

(Something more elaborate that would let some values be signed and others unsigned would be possible, but that complicates things; just having everything be one way or another is simpler, and easier to implement.)

The barrier has been that Dialog itself doesn’t understand a negative number, but we’re proposing to add those to the language anyway.

Draconis · June 5, 2026, 4:32pm

I still have some reservations about implementation difficulties, but the first step would be adding a field in the header where flags like that can be stored, and that at least seems like a good step to take right now. Then the interpreter-side issues can get figured out later.

Candy64 · June 5, 2026, 4:52pm

I can’t help but wonder about something. Maybe it’s just caveman who do math with stick in dirt. But you already have signed numbers hiding inside your unsigned ones.

0–32767 is the positive range.
32768–65535 is the negative range.

You can build functions to do real math on top of this. For subtraction always subtract the smaller number from the larger. If the number you’re subtracting is larger than the number you’re subtracting from, you know the result is negative. The negative result is just 32768 + (difference between the two numbers). And checking for negative is simple. If value >= 32768, it’s negative.

Caveman no understand why tribe need new VM when negative numbers already hiding in plain sight.

Draconis · June 5, 2026, 5:04pm

That’s basically what I recommended above (a software implementation, either of full-on bignums or just signed 14-bit numbers), but if we’re switching to a 32-bit word size for other reasons, there’s also plenty of space to have signed integers built into the actual VM. That way they’ll be significantly faster and use less memory.

Personally, my main reason for wanting a 32-bit word size is the limits of the character type. On the 16-bit Å-machine it’s limited to one byte, and I’ve hit that limit multiple times. I imagine others will just want more heap space! Plenty of reasons to want a larger system.

Draconis · June 6, 2026, 3:50am

Adding a “flags” byte (or word or short) to the header is a bit tricky because of the current format:

HEAD:
	BYTE[2]		version	File format version (major, minor)
	BYTE		wordsz	Word size (currently always 2)
	BYTE		shift	Shift amount for short/long string pointers

	SHORT		release	Story release number
	BYTE[6]		serial	Story serial number (ASCII)

	LONG		crc	Running CRC-32 of the contents of LOOK, LANG,
				MAPS, DICT, INIT, CODE, and WRIT, in that
				specific order

	WORD		heapsz	Size (in words) of heap/env/choice area
	WORD		auxsz	Size (in words) of aux/trail area
	WORD		ramsz	Size (in words) of random access area
				(including long-term heap)

	BYTE[46]	ifid	(optional) "UUID://...//" + null (ASCII)

The IFID at the end is optional, and there’s no more room before it.

So if we wanted to add a “flags” field:

We could specify that it’s optional, and goes before the IFID (if any); requiring a certain bit to be always 0 (or always 1) can ensure it never looks like an ASCII ‘U’
We could add a separate IFF chunk instead of storing these things in the HEAD; call it OPTS or the like

Since there’s not a pressing need for either yet, I don’t think it’s something that has to be decided right now. (If we enshrined it as a new header field in the 32-bit spec, that wouldn’t help on 16-bit platforms, and there’s nothing that really needs a flag yet.)

Draconis · June 6, 2026, 7:42pm

Now, this is an array of up to 256 instruction words. Start with the whole input in the prefix and nothing in the suffix.

$00 00 00 00: Nothing worked. Abort the whole process.
$00 00 00 01: Check if the prefix is a recognized dictionary word. If so, succeed with this prefix and suffix. If not, continue.
$XX XX XX YY: Is the last character of the prefix $XX XX XX? If so, move it from the prefix to the suffix, then jump to word $YY of the decoder. If not, continue.