I’ve been experimenting with Dialog, the Å-machine, and reading the documentation and the spec. Here are some (lots of!) comments and other remarks. (In no particular order; it’s a bit messy because I wrote them down I as went along. Some might not even be relevant.)
I would tag Linus directly, but since he hasn’t been very active on the forum lately, I don’t know if he’d be OK with that.
About the spec
- The spec uses
..
to denote ranges of integers. It should say that those includes the end (i.e.1..3
denotes{1, 2, 3}
). While it may be obvious when you look at the values, a lot of programming languages use exclusive ranges so it doesn’t harm to add a precision. - In the “Text” part of section “Runtime data”, the ranges of character are denoted with a dash instead of 2 periods like earlier. (Not very important, but inconsistent still.)
- The spec doesn’t say what’s the size of a LONG value. (It would seem it’s 32-bit.)
- It could help to indicate the types of arguments and returned values of the functions in the section “Opcode semantics”.
- It would be nice to have an English description of what the opcodes do (like in the Glulx spec), instead of just a pseudo-code implementation. It makes it easier for implementers to understand how the Å-machine works.
- In the description of the operand types, the spec say that CODE operands can be relative pointers, but it doesn’t say how are negative numbers encoded. It seems to be two’s complement, with the sign bit being the 3rd one? (Since the first 2 are the tag indicating it’s a relative pointer.)
- The spec doesn’t explain the meaning INDEX operands.
- What is the size and default value of TMP? (Or it might not matter since it’s only for temporary uses.)
- In the explanation of live values at the beginning, there are two ranges of reserved values (
3f01..3fff
anda000..bfff
), but only one in the explanation of how the tags work just below. What about the other one? - I found the initialised registers and the RAM contents a bit confusing. Are the main and aux heaps in the RAM, like the long-term one? If not, is the initial state of the main and auxiliary heaps present in the INIT chunk? (I think I managed to understand by looking at the JS implementation, but I still find that part of the spec difficult to understand.)
- How the STRING operands work is not clear in the spec. Where (in what chunk) do they point? What’s the meaning of “shift”? In the end I just looked at the JS implementation.
- The spec says that “characters 80-ff are mapped to unicode glyphs by a game-specific ‘character mapping table’”. The word “glyphs” should be replaced by “code points”, if I’m not mistaken. (A glyph is a graphic representation of a character. A single char can have multiple glyphs. For example, an arabic letter has a different glyphs if it’s written alone or inside a word.) While I’m at it, “Unicode” is a proper name and should have a capital U.
- The spec should say that failing stops the current opcode execution? It only says “leave recursive unify/push/pop/(de)serialize operation” which is a bit vague.
- In the function
pop_lts
of the JS implementation (calledpop_longterm
in the spec), the casev == 0x8000
is missing. Who is right? - In
MAKE_PAIR
, the spec does not consider ifa1
is a dest (in which case it should be unified) (atelse if(a3.tag == pair)
), whereas it does in the JS implementation (in the functionmakepair
). Also, the JS implementation uses adestvalue
function that is not present in the spec. (Rereading that point, I’m not sure it makes sense, but I do remember that part caused me problems and that the spec seemed incorrect.) - In
ENTER_LINK
in the JS implementation,uppercase
is cached, set tofalse
then restored. Why? (I guess it’s because we don’t want the link’shref
to consume the uppercase. But why wouldn’t we? If that’s not what the author intended, he could have made the uppercase call after theenter_link
. Also, it makes it impossible to create links executing a command with an uppercase, I think? But now that I think about it, it could happen in Dialog that the uppercase call comes before.) Also in this opcode in the JS implementation, the values aren’t taken from the heap whereas they are in the spec. - Related to above, is it possible to have links pointing to commands with uppercase characters in Dialog? I guess not, since the words are represented by a list of dictionary words.
- In inputs, what to do with characters not in the story’s character set? The JS interpreter gives them a value of
0x3F
, i.e. the ASCII question mark. But that makes it impossible to differentiate them from real question marks. Why not use one of the reserved characters to represent unrecognised ones (in the range00..1F
)? But I suppose it would be a breaking change. - In
GET_KEY
, how is the key processed? Is it lowercased? In the JS implementation, characters not in the story’s character set are ignored (and we keep waiting for a key) instead of being converted to ‘?’ like in inputs. Also, maybe it could be a good place to remind how special keys like “enter” or the arrows are represented; and say that the result is converted to a tagged character. - In the JS implementation at the end of
vm_proceed_with_key
,SPC
is set tospace
, but it’s not at then end ofGET_KEY
in the spec. - Maybe clarify the meaning of the arguments of
find_in_wordmap
? - In
AUX_POP_LIST_MATCH
, it would make more sense if the localmatch
were a bool. Same thing inPOP_LIST_CHK
forflag
. (But it’s not that important either.) - Why save
NOB
,LTB
andLTT
in the game state since they are initialised once and never changed? - In
SAVE
/SAVE_UNDO
, remind once more that the code argument is the new inst in case of a restore? (because it’s only said in the “Savefile” section.) - In
EXT0
UNDO
, it’s not indicated that the opcode should fail if we are trying to undo past the first turn, as opposed to doing nothing if we can’t undo for other reasons (e.g. we tried to undo more turns that the interpreter saved). - In
prepend_chars
, the chars are encoded with the game-specific encoding? Everything is lowercased? - In
wordlist_to_string
, the returned value is a string in the spec, but maybe it should be mentioned it’s encoded with the game-specific encoding. - In
IF_WORD
/IFN_WORD
, it seems to be implied in the condition that we have to dereference arg1 multiple times (once each time the previous condition failed). It doesn’t change anything, of course, but I believe it’s slower than dereferencing once and caching the result. - Should
ADD_RAW
/INC_RAW
and other arithmetic operations overwrap if the result is too big? Also, why& 0xFFFF
since they are already 16-bit (same thing inRAND_RAW
)? Maybe that’s for the overwrapping, actually? But it’s not clear since they are supposed to be 16-bit already. - In
MUL_NUM
, is the final& 0x3FFF
necessary sincebox_int
will do it anyway? (And it’s the only_NUM
opcode that does that.) - When saving the game data, maybe more explanations should be added about the padding with
0x3F3F
? (That we need to check against the boundsLTT
,AUX
,TRL
,TOP
,ENV
, as in the JS implementation.) Also, the JS implementation does not xor with the INIT chunk but with the initial saved data (or maybe it’s equivalent?). - In the extended character table, are the corresponding lower- and uppercase character the index in the table or the code corresponding to the character in game-specific encoding?
- Cloak of Darkness (and maybe all Dialog games) do not take account of
VM_INFO 41
(save/restore support) - What to do with an invalid argument to
VM_INFO
? Throw a fatal error like in the JS implementation? Or just ignore it like in the spec? (Ignoring it can lead to unspecified behaviour, since then we cannot know what’s in the dest.) The safest way would be to fail, maybe? (But that would be a breaking change, I guess.) The same question for invalid values inEXT0
, except there’s no undefined behaviour since there is no dest. - Out of curiosity: why does
CHECK_GT_EQ
work on word/vbyte butCHECK_GT
work on value/byte? - Explain what the opcode
TRACEPOINT
should do in more detail. - What does the
rand()
function generate? A random 16-bit number? A random number between 0 and 16 383? A random tagged number (0x4000..0x7FFF
)? A float between 0 and 1? (OK, surely not a float between 0 and 1.) - When decoding reserved chars: what to do? Show a question mark or the Unicode replacement character? Or is it at the interpreter’s discretion?
- In
LEAVE_DIV
, what if the div list is already empty? Granted, it should never happen on well formed stories, but you never know. I guess the best to do is to throw an invalid output state error. - In the spec,
print_value
with a pair recursively calls itself, which can print too many/to few spaces if some of the pair’s contents are special characters (since they are a special case in the function). In the JS implementation,print_value
instead delegates to another function,val2str
, and that’s the function that recursively calls itself. And since that second function also has a case for characters, but printing them without modifying the whitespace, it’s OK. Also, inENTER_LINK
, the spec just says “append text of w to end of input”, without specifying what’s the text of w. It is in fact the text produced byval2str
. So my opinion is that the spec should inline the character case ofprint_value
in thePRINT_VAL
opcode, and addval2str
at the other places, just like the JS implementation (and adding a character case to it for when it’s called recursively for pairs). Hope I’m clear. - The “uppercase the next char” feature is on the frontend side in the spec, but I find it easier to track it on the engine side because the lowercase-uppercase correspondances are stored in the LANG chunk. (And in fact, that’s what the JS implementation does.) What’s the better approach in the end? Should the spec correspond to how the JS does it?
- About
output_clear_links
: does it clear absolutely all the links, or only the command ones? (Because I don’t see why we would want to unlink external resource ones.) - Would it be better to track the whitespace of status bars separately? (So that we could enter a status bar from inside a paragraph without having an implicit paragraph break.) (I’ve not tested this point, it’s just a guess.) But that would be a breaking change I suppose.
- The way the story file stores the uppercase equivalent of characters in the LANG chunk will not work for every Unicode character, because a single char can become several ones in uppercase. For example, ‘ß’ in German becomes ‘SS’. (Yes, I’m aware there exist a capital ‘ß’, but I believe it’s not used a lot, and also it’s the first example that came to mind.)
- According to Dialog’s documentation: "
(clear div)
clears, hides, or folds away the current div. Note that if more output is sent to the cleared div, this new output may or may not be visible to the player." What’s recommended for an interpreter: show or don’t show further output sent to div? (I think that the JS frontend shows a “plus” button to expand the div back?) - The spec doesn’t say what should happen when we get an input, interface-wise. Do we append the command to the current paragraph/div/span? Does the next content continue in the same paragraph than before the input? As an example, the JS implementation adds margin to the paragraph containing the input. But shouldn’t the story control this instead? (For example by entering a div before printing the prompt and asking for input.)
- The spec says: “No attempt will be made to clear or clear_all from inside a span or a status area.” Is that also the case with
clear_div
andclear_old
? - In Dialog’s documentation: “Spans and inline style changes (bold, italic, reverse, fixed pitch, and roman) are ignored in the top status area.” But nothing is told about that in the spec. In my custom web frontend, I do take account of spans in the status bar and it causes no problems. (At least as long as we leave the spans before leaving the status bar, but Dialog ensures that’s the case.)
- Dialog’s documentation states that things will behave in a certain way when the spec makes no garantees. For example, the documentation says that progress bars are scaled to fit the width of the current div, but the spec says nothing about it, stating: “The exact meaning of each of the above API calls is not specified here.”
About Dialog's documentation
- In chapter “Manipulating data”, closures are not mentioned in the list of types of value. (Technically closures are just lists so there’s nothing wrong with the documentation, and it’s a bit more advanced so it’s understandable to mention them later, so feel free to ignore this point.)
- In chapter “Input and output”, the sentence “In CSS, em represents the height of a capital M” is not true. In CSS,
1em
is simply equal to the value of the currentfont-size
(which admitedly can be approximated by the capital M, but that depends on the font). - Is there a way to style the Z-machine differently from the Å-machine? By including different files?
- Related: using the same CSS rules for the Å-machine and the Z-machine leads to incorrect behaviour (or at least not an ideal one). For example, if by convention
1em
means “1 line” in the Z-machine, it will lead to a status bar that is too small in the Å-machine web interpreter, since1em
means "the currentfont-size
" in CSS, and some characters can be taller that the currentfont-size
. (E.g. if you set afont-size
of16px
, there can still be some character in the used font that will exceed 16 pixels. And we haven’t even taken account ofline-height
yet!). There are also problems if the window is too small: it will cause the lines to wrap, but the height of the status bar will remain locked at1em
because of the CSS. The same thing can be told withch
:1ch
is the width of the character0
, so you can actually fit more characters in a container than itsch
length (because the average character is narrower than zeros.) So my opinion is that it would be better to separate the style definitions for each interpreter into different files in the standard lib. As an example, a style for status bars with 2 elements, that works well in the web interpreter, using flex-box. (But of course it won’t work well on other interpreters. My point is just that it’s a bit vain to try to unify how the style works in every backend)
(style class @status)
display: flex;
justify-content: space-between;
(style class @right-status)
text-align: right;
(program entry point)
(status bar @status) {
(div @left-status) { You apartment. }
(div @right-status) { Score: 0 }
}
- The documentation says that inline status areas are only available on the Å-machine. On the Z-machine, why not make them display what Inform calls a boxed quotation, since they are similar (fixed letter spacing, can only have one on the screen at a time)? But it would be a breaking change.
- In the “Access predicates” section,
@($Obj is $Rel $Parent)
is defined with a multi-query (so that we can loop on objects in/on/etc. another one, if I understand correctly), but when using(now)
with it as shown just after, the multi-query disappears? The documentation should explain why. - At several places, the documentation mentions the Z-machine without acknowledging the existence of the Å-machine. (For example in the “Runtime errors” of the “Beyond the program” chapter, it is written: “via the undo facility of the Z-machine”; it could instead say “via the undo facility of the underlying virtual machine” to be more generic.)
- Some metadata predicates are missing compared to Inform 7: story genre, story creation year. Is that deliberate?
- Since the documentation implies
(style class $)
rules are treated specially by the compiler, maybe add a reminder that you still have to escape parentheses inside it? (Since there are CSS values and functions that need them.)
"About the JS implementation
- In
space_n
, instead of appending a span with a width inch
(which will always be too big since1ch
is the width of the ‘0’ character and not the space), why not append a span containing the given number of spaces, along with the CSSwhite-space: pre-wrap
to prevent the browser from collapsing them. It’ll also have the more correct behaviour if it cannot fit at the end of a line (the line will break at this space like with any run of spaces; the full span will not go to the beginning of the next line). - There’s a bug in
vm_proceed_with_key
: the conditioncode == (e.lang[entry + 2] << 16) | (e.lang[entry + 3] << 8) | e.lang[entry + 4])
will always be true since==
has precedence over|
. To fix: wrap the second term of==
with parentheses. - Links should be true
<a>
s and not<span>
s for a better accessibility. (For example, you can’t use the tab key to navigate between links if they are spans, and I believe screen readers will announce them incorrectly). Then when clearing them withclear_links
, we can remove theirhref
attribute to make them unfocusable placeholder links. - Similarly for the menu, the options should be buttons and not divs.
- For the progress bar, a
<meter>
element instead of a plain div might be semantically better. (It’s more difficult to style it, though.)
I’ll add more things if/when I encounter them.
And I have a similar list about Glulx and Glk, so stay tuned!