A 32-bit Å-machine

The Å-machine was always intended to support a 32-bit word size eventually. The second byte of the header gives the word size in bytes, which is currently always 2. This is good for compatibility with retro machines like the C64 (and with the Z-machine, which borrows a lot from the 16-bit Å-machine architecture). But it’s also very limiting; it would be nice to have a 32-bit option as well, for larger games.

Some parts of this are easy enough: just use 32-bit pointers instead of 16-bit ones, for example. But the Å-machine is a tagged architecture. Unlike on the Z-machine, where a 16-bit word could plausibly be a byte address, a packed address, an integer, an object number, or something else entirely, every 16-bit word on the Å-machine has one and only one meaning.[1]

For example, here is the encoding of literal values:

000vvvvv vvvvvvvv	object (v > 0)
001vvvvv vvvvvvvv	dictionary word (v < $1E00)
00111110 vvvvvvvv	character
00111111 00000000	[] (nil, the empty list)
00111111 00111111	sentinel for unused memory
00111111 vvvvvvvv	(reserved) (all other v)
01vvvvvv vvvvvvvv	integer

All literal values start with 0; 1 means a reference.

Now, we could just keep these tags intact and increase the range of each type of value by a factor of 216. But that seems like a wasted opportunity. This system was designed to fit as many values into 15 bits as possible, which is why it doesn’t have things like signed integers.

The current system allows 256 different characters, for example, which are assigned by a Unicode translation table in the story file. If we keep this particular one-byte tag and expand the one-byte value to three bytes, then we can support all of Unicode!

Currently, a full quarter of the value space is allocated to unsigned integers. What if we gave half of that chunk to signed integers, and the other half to floats? We’d need to figure out a 29-bit floating-point format, but that seems like a sufficient number of bits for your average parser IF.

Routine storage and packed string storage are limited to eight megabytes each (24-bit pointers to individual bytes, with the first bit marking it as a full 24-bit pointer instead of a shorter one). That could trivially be doubled by requiring all pointers to be in “long” form.

The Å-machine currently uses one-byte opcodes, which means there aren’t a lot available. Is that worth changing? We haven’t hit the limit yet.

It has 64 registers and 64 “env slots” (local variables for each routine). Is that worth changing? Right now those registers (unlike the Z-machine ones) are only used for routine arguments and temporaries; global variables are stored as an array in RAM instead.

What limits need to be loosened and which should stay consistent? If we’re increasing the word size to 32 bits, I don’t think we necessarily need to worry about file size any more; but consistency with the 16-bit format makes it easier for interpreters to support both, so we also shouldn’t change things just for the sake of changing them.


  1. This is why it only supports 14-bit unsigned integers instead of the Z-machine’s 16-bit signed integers. ↩︎

3 Likes

How would increasing the range of opcodes work if we’re also continuing to support the 16-bit Å-machine? The new opcodes would have to be ones that are only useful when working with 32-bit values, or else we’d end up having to restrict certain functionality to the 32-bit version based on opcode availability rather than for any fundamental reason.

Pretty much that. If we decided to add a suite of floating-point operations, for example, those wouldn’t be available (or relevant) in the 16-bit version. Or if we made an opcode to cut a 32-bit word into 16-bit halfwords.

1 Like

Yes, this. That suggests that yes, we should increase the available number of opcodes, for just that reason. We don’t want to exhaust the limited pool of 16-bit Å-machine opcodes for 32-bit-only operations.

1 Like

My first thought on this front is, keep the one-byte opcodes from the 16-bit spec, but give one of those bytes ($FF?) the meaning “prefix for 32-bit-only opcode”. Then the 32-bit spec can have another 256 opcodes of its own, and the 16-bit interpreters don’t need to deal with variable-length opcode parsing.

256 more opcodes is probably overkill, but the cost of one extra byte of ROM is negligible on a 32-bit machine.

Dialog can’t currently produce routines with more than 13 parameters: the maximum arity of a predicate is 12, plus one for the (just) pointer. So 64 seems like a very generous limit for those already. Running out of temporaries could be an issue, but the only time that limit has been hit, it was due to a compiler bug.

We could also just make “long” form be four bytes long instead of three.

Floats do create a lot of new problems, since they break the assumption that you can always compare literal values bitwise for equality. Perhaps give half the chunk (29 bits) to signed integers and leave the other half unspecified for the moment? Or even give a quarter of the chunk (28 bits) to signed integers and leave the other 3/4 unspecified. That can handle all eight-digit numbers, just like how the current system (14-bit unsigned) can handle all four-digit numbers.

1 Like

I thought 32-bit floats was sufficient in Glulx, but Graham groused until I added 64-bit floats.

I forget the exact use case.

Anyhow, you already need some equality operators for floats which are not direct literal comparison. (int)0 should equal (float)0.0, and comparing float == float is often (not always) a category error. Putting floats outside the native value space doesn’t cost you that much.

Also, whatever format you choose, it has to be easily converted to 32-bit or 64-bit native floats so that you can pass it to libc sin(), cos(), exp(), and so on.

1 Like

Now there’s a thought! Currently, 100vvvvv vvvvvvvv is a pointer to a word on the heap, 101vvvvv vvvvvvvv is undefined, 110vvvvv vvvvvvvv is a pointer to a pair (cons cell) on the heap, and 111vvvvv vvvvvvvv is a pointer to an extdict on the heap. Floats could also potentially be stored on the heap, which would let them be however long we wanted.

Some other current limits:

The long-term heap’s size is limited to $7FFF bytes, because global variables and object properties use $0000 for “unset”, $0001-$7FFF for a literal value (as above), and $8000-$FFFE for a pointer into the long-term heap. This one is easily changed when we increase the word size; presumably $0000_0000 will be “unset”, $0000_0001-$7FFF_FFFF for literals, and $8000_0000 to $FFFF_FFFE for pointers.

The instruction pointer (and continuation pointer, which holds the same thing) are already 32-bit values, but their effective range is limited to 23 bits by the way jump instructions are encoded. This definitely seems worth increasing to the full 32 bits while we’re at it. But is it worth increasing beyond that? Four gigabytes is an enormous amount of code; even Glulx doesn’t allow that much. (Unlike Glulx, the Å-machine has separate address spaces for code, strings, and RAM. No self-modifying code here.)

The whitespace level is stored in a byte, so you can’t have a div margin larger than 249em. But why would you ever want that?

Rather than raw 16-bit (soon 32-bit) pointers into RAM, the Å-machine uses 13-bit (soon 29-bit) pointers into the individual heaps, so global/object data, the main heap, and the aux heap are each limited to an eighth of the total 16-bit (soon 32-bit) RAM address space. These are word addresses, not byte addresses, and 229 words seems like more than any individual heap would ever need. But, global/object data and the long-term heap currently share their address space. If it becomes a problem, those could be separated out. Maybe that’s worth doing anyway? (Those two specifically can also be addressed in an alternate way that expands the range.)

.aastory files are in IFF format, which means each chunk’s length in bytes is stored as a 32-bit signed int. This means, if we want more than two gigs of code, we’ll need to allow multiple CODE chunks. Is that worth doing? Two gigabytes is still a lot, and the current encoding of jump instructions makes 31-bit absolute pointers easier than 32-bit ones.

Inform has the same problems (storing Glulx chunks in Blorb files) and nobody anticipates it being a problem, ever.

1 Like

I’ve started trying to clean up the existing specification and clarify the types of all the instruction operands. The previous spec uses the WORD operand type to mean two different things, for instance: either a large immediate numeric constant, or an immediate literal value (as seen earlier in this thread). I changed the latter to VWORD.

There’s still a pretty serious ambiguity in how the VALUE type is used: some opcodes, if they see a VALUE above $7FFF, will treat it as a pointer and dereference it. Others will treat it as a large unsigned value. I’d like to change the latter to be called RAW. But this ambiguity runs deep and I want to make sure I get it right.

2 Likes

There we go. I’ve documented all the operand types as thoroughly as I can, and as part of that, I split some of them apart: WORD/VWORD and VALUE/RAW.

Now I can summarize all the different types the Å-machine uses!

Literals

These are the basic 16-bit values that are always encoded the same way, no matter where they appear. I posted them up above, but I’ll include them again.

000vvvvv vvvvvvvv	object (v > 0)
001vvvvv vvvvvvvv	dictionary word (v < $1E00)
00111110 vvvvvvvv	character
00111111 00000000	[] (nil, the empty list)
00111111 00111111	sentinel for unused memory
00111111 vvvvvvvv	(reserved) (all other v)
01vvvvvv vvvvvvvv	integer

These only go from $0001 to $7FFF—zero, and all words with the high bit set, have varying meanings depending on context.

Data in memory

For a “live value”—something that’s currently alive, either on the heap or stored in a register—it works like this:

0vvvvvvv vvvvvvvv	literal (v > 0)
100vvvvv vvvvvvvv	reference (word index into the heap)
101vvvvv vvvvvvvv	(reserved)
110vvvvv vvvvvvvv	pair (cons cell; word index into the heap)
111vvvvv vvvvvvvv	extended dictionary word (word index into the heap)

For a global or object variable (something that persists through clearing the heap), it works like this:

00000000 00000000	unset/null
0vvvvvvv vvvvvvvv	literal (v > 0)
1vvvvvvv vvvvvvvv	word index into the long-term heap

For a serialized data stream (a way of storing data temporarily on the aux heap, used for (collect $) and (collect words), or in long-term storage):

00000000 00000000	end of stream
0vvvvvvv vvvvvvvv	literal (v > 0)
10000000 00000000	unbound variable
10000001 00000000	extdict (two elements) follows
10vvvvvv vvvvvvvv	reserved (all other v)
110vvvvv vvvvvvvv	proper list with v elements follows
111vvvvv vvvvvvvv	improper list with v elements follows

Operands in the bytecode

BYTE: a small constant number, such as the number of local variables to allocate space for.

vvvvvvvv

VBYTE: a small constant literal, which means an object number from 01 to FF.

vvvvvvvv	(v > 0)

WORD: a large constant number, such as a line number for tracing.

vvvvvvvv vvvvvvvv

VWORD: a constant literal.

0vvvvvvv vvvvvvvv	(v > 0)

RAW: an unsigned 16-bit number. If this points to a register or env slot, and that register or env slot holds a value greater than $7FFF, that will be interpreted as a very large number.

0vvvvvvv vvvvvvvv	number 0000-7FFF
10vvvvvv			value of register v (00-3F)
11vvvvvv			value of env slot v (00-3F)

VALUE: a live value. If this points to a register or env slot, and that register or env slot holds a value greater than $7FFF, that will be interpreted as a word pointer to the heap (and dereferenced if necessary).

0vvvvvvv vvvvvvvv	literal (v > 0)
10vvvvvv			value of register v (00-3F)
11vvvvvv			value of env slot v (00-3F)

DEST: where to store the result of some computation.

00vvvvvv	store into register v (00-3F)
01vvvvvv	store into env slot v (00-3F)
10vvvvvv	unify with register v (fail if not possible)
11vvvvvv	unify with env slot v (fail if not possible)

INDEX: a medium-sized constant number, such as a property number or style class index. You’ll notice some redundancy in this one; numbers from $00 to $BF can be represented in two different ways. But $3FFF (16,383) properties and style classes is a lot; allowing $C0 (92) more of them is not worth the extra difficulty in decoding.

0vvvvvvv			v (00-7F)
10vvvvvv			v + $80 (80-BF)
11vvvvvv vvvvvvvv	v (0000-3FFF)

CODE: a byte address in the CODE chunk. Strictly speaking, 00000000 is defined as a jump to absolute address 0, but the first byte of the CODE chunk must always be $01 (FAIL), so the results are exactly the same.

00000000					fail
00vvvvvv					+v bytes (01 - 3F) from the beginning of the operand
01vvvvvv vvvvvvvv			+v bytes (signed -2000 - 1FFF) from " " " " "
1vvvvvvv vvvvvvvv vvvvvvvv	absolute address v (00 0000 - 7F FFFF)

STRING: a byte address in the WRIT chunk. “S” is provided in the game header. I honestly have no idea why “tiny” pointers always shift by 1 instead of shifting by S; I don’t see any benefit to doing it this way. But Dialog always compiles S = 0, so I guess a shift of 1 is more efficient than a shift of 0…?

(Maybe something we should have changed in 1.0. Oh well.)

0vvvvvvv					address v << 1
10vvvvvv vvvvvvvv			address v << S
11vvvvvv vvvvvvvv vvvvvvvv	address v << S

…and that’s it! These are the main things we’d have to consider modifying for a 32-bit Å-machine. I’ve also documented all of this in the spec, so once the PR is merged, it’ll be available in a centralized location.

2 Likes

From my point of view, the limitation that really bothers me is the limited size of integers. Having an option just to expand those to a signed 32 bit (or even a signed 31 bit) value would be sufficient for almost anything I can think of currently. I think adding floats is a bit of a pandora’s box.

1 Like

That makes sense, yeah! I have the bones of a proposal cooking at the moment; I’ll probably pitch it by this weekend.

In the meantime, there’s one other data structure I forgot: the format of packed strings!

The 16-bit Å-machine uses a one-byte character set. $00-1F (and $7F) are control characters, $20-7E are ASCII, and $80-FF are defined by a big table in the story file. This table contains, for each entry:

  • The Unicode codepoint (three bytes)
  • The code for the corresponding lowercase character (one byte)
  • The code for the corresponding uppercase character (one byte)

The story file also defines which of those are “stop” characters (treated as word dividers during parsing) and which inhibit automatic whitespace on the left and/or right.

Strings are encoded with a Huffman-esque prefix tree. This tree is defined by a table of 1 to $7F two-byte nodes.

The program starts at the first bit of the string and the root node of the tree. If that first bit is 0, it looks at the first byte (left branch) of that node for what to do next. If that first bit is 1, it looks at the second byte (right branch).

  • $00-5E: Print character ($20 plus x).
  • $5F: Read N bits from the string, MSB first, where N is the number of bits necessary to identify all the “excess” characters (characters $A0 and above) and all the dictionary words. Print that character or dictionary word.
  • $60-$7F: Print character ($20 plus x).
  • $80: Stop.
  • $81-$FF: Go to tree node (x minus $80), advance to the next bit of the string, and repeat this process.

(Specifically, N is defined as ceil(log2(max(0, n_char - $A0) + n_dict)).)

I believe this “using dictionary words to compress game text” trick comes from Level 9’s A-machine. This tends to make for much more compact strings than Z-machine compression. Dictionary words are stored in a much simpler format: just an array of bytes.

Finally, there’s the removable word endings decoder, which is very simple. Its goal is to take a string (array of characters) and see if any of the removable word endings is a valid suffix. So you start at the beginning of the decoder (which is always $01) and the end of the string, then read each byte in sequence:

  • $00: Nothing worked. Abort the whole process.
  • $01: Check if the prefix is a recognized dictionary word. If so, succeed with this prefix and suffix. If not, continue.
  • $XX $YY: Is the last character of the prefix $XX? If so, move it from the prefix to the suffix, then jump to byte $YY of the decoder. If not, continue.

So if the valid suffixes are “e”, “s”, and " 's", the decoder might look like:

	01				"No suffix at all" is always valid
	`s` found_s
	`e` found_final
	00				Fail
found_s:
	`'` found_final	We look for the maximal-length suffix first
	01				But "s" is also a valid suffix on its own
	00
found_final:
	01				Either this works or it doesn't; there are no more options
	00

Other miscellaneous limits (places where the spec uses an array of multiple BYTEs instead of the WORD type):

  • The chunk of a save file holding the special-purpose registers is fixed at 26 bytes
  • Pointers to subsections in the LANG chunk are fixed at two bytes each
  • Pointers to words in the DICT chunk are fixed at two bytes each
  • The number of wordmaps is fixed at two bytes (cf the INDEX type)
  • Pointers to wordmaps in the MAPS chunk are fixed at two bytes each
  • Wordmap payloads can only hold object numbers up to $1FFF
  • The number of style classes is fixed at two bytes (cf the INDEX type)
  • Pointers to style classes in the LOOK chunk are fixed at two bytes each
  • Pointers to object tags in the TAGS chunk are fixed at two bytes each
  • The number of resources is fixed at two bytes (cf the INDEX type)
  • Pointers to resource descriptors are fixed at two bytes each

As you can see, there are some places where Linus specified a WORD, and other places where he specified a BYTE[2]. The intent seems to be that a WORD would change with the word size, but a BYTE[2] would not. I don’t really understand why he sometimes chose one and sometimes the other, though—the number of object tags is a WORD, but the pointer to each tag is a BYTE[2], and there have to be at least as many distinct pointers as there are tags.

The INDEX operand type is used to index into the MAPS and LOOK chunks, and it maxes out at $3FFF. If we decide not to increase the range of that type, it may not be necessary to change those pointers either.

STRING: a byte address in the WRIT chunk. “S” is provided in the game header. I honestly have no idea why “tiny” pointers always shift by 1 instead of shifting by S; I don’t see any benefit to doing it this way. But Dialog always compiles S = 0, so I guess a shift of 1 is more efficient than a shift of 0…?

Having tiny pointers shifted by 1 has the benefit of addressing strings that start at a multiple of 2 offset (most useful for short strings where you can keep a lot of them within the pointer’s reach, and S>1). I think having S>1 (and S>0 more generally) is only necessary when padding strings with extra space at the end is worth it (e.g. lots of humongous strings), since the returns diminish with the pointer’s granularity decreasing with increments of 2^S bytes.

I guess it’s simpler to have tiny pointers with a fixed shift size of 1 as it allows assembly of story files to opportunistically use tiny pointers, while not having to pad any strings.

Yeah, I’m just not understanding why tiny pointers are shifted by a different amount than the other pointers. When S = 0 it makes some sense, because it means the tiny pointers can cover more of the address space than the other types, and we want to use tiny pointers as much as possible. But for S > 1, it seems like having tiny pointers also use S would be a substantial improvement.

I do have a use for floats, but I might be unusual.

If we’re adding signed numerical values to the language, having that available back in 16-bit land, either through something like (enable negative numbers) or through some syntax like $-Number would be a win.

I’m hesitant to bring negative numbers into the existing 16-bit Å-machine, for a few reasons.

One, there’s no more room in the literal value representation, so adding signed numbers means getting rid of unsigned numbers—they can’t coexist in the same VM. (Unless you want to be limited to 254 negative numbers, I guess.)

A new header flag to choose between them would be an option (like the (enable negative numbers) you mention), but without separate signed and unsigned types, the behavior of everyone’s Dialog code would change significantly depending on whether that flag is enabled. If some library or extension defines a recursive predicate that relies on ($ minus $ into $) failing past zero, it would get stuck in infinite loops once you turned on the flag, with no easy way to find the problem.

This would also make the observable behavior of a Dialog program be different between Z-machine and Å-machine, which isn’t ideal (that infinite loop would only happen on Å-machine). Having behavior change between 16-bit and 32-bit platforms seems fine, but between different 16-bit platforms means sacrificing a lot of portability.

And last but not least, it would mean a significant slowdown in all numerical operations on older platforms like the C64. Right now, converting between Å-machine integers and native platform integers is easy in both directions (it’s just a bit mask), and instructions like CHECK_GT_EQ get the right results by comparing the raw literal values. That won’t work with negative numbers enabled.

Given all these drawbacks, and how hard it’ll be to add support to the C64 with my tenuous grasp of 6502 assembly, I think for the moment it’s better to handle negative numbers in software rather than hardware (so to speak). That way it’ll work instantly across platforms and interpreters with no loss of compatibility. I’m still (slowly) working on my generalized bignum library (division is hard), which will provide a ready-made solution for this, but using a representation like [- 10000] or just literal numbers with a bias should work okay for the time being, I think.

As far as floats go, I think I’m going to leave those for later rather than including them in the initial proposal, but I’m specifically leaving room for them in the data model. I think zarf’s suggestion of storing them on the heap (like extdicts) is the right call, because three heap words can easily hold a 64-bit value with plenty of room left for the tags.

2 Likes

A question for the low-level programmers out there! If the Å-machine has a tagged signed integer type—that is, a signed 28-bit number v is stored as the bit pattern 0100vvvv vvvvvvvv vvvvvvvv vvvvvvvv or something along those lines—how easy or hard is it to convert between that and a native integer type, using the classic C-style bitwise operators?

If v were unsigned, it would be easy (as it currently is for the 16-bit one): just bitwise-or in the top four bits to create an Å-machine integer, and mask them out to create a native integer.

But for signed values, sign extension is needed, and systems usually don’t have native ways of sign-extending 28 bits to 32 bits. So the best I can come up with for the moment is:

  • To convert native to Å-machine, mask out the top four bits, then bitwise-or in 0100.
  • To convert Å-machine to native, mask out the top four bits, then measure the next (fifth) bit. If it’s set, bitwise-or in 1111.

Which is probably not really an issue on any machine that can handle a 32-bit word size, in the end. But the conditional feels clunky, and I’m wondering if there’s a more elegant way to describe this in the specification.

(This is also tricky to do in a language like JavaScript that doesn’t like specifying the exact size of integers. I don’t know how to get around that…)

Similarly, checking if a 32-bit signed integer is outside the range of a 28-bit signed integer. Is it true that (assuming two’s complement) a number is out of range iff the top four bits are not all ones and not all zeroes? If so, is there an easy way to test that? (The Å-machine needs to know when an operation overflows instead of silently continuing.)