Z-Machine undefined behavior

My understanding is that unicode characters that are not in the extra character table are not valid for input. If this is correct, then only the first case is conforming.

I know of nothing in the standard that would dictate one case over the other. It all depends on when the transition to lower case is made.

In the first case, a story file with non-matched upper and lower case characters in the extra characters table could lead to case-sensitive input in a way that can’t happen with ascii characters.

Agreed, but this case sensitivity is already in the standard for read_char.

My thinking was that check_unicode returns bit 1 set for a character “if and only if the interpreter can receive it from the keyboard” which I took as a requirement for the character to be in the extra characters table. Thinking about it, this is probably more strict than what is mandated.

Of course read_char is a different beast anyway, accepting characters that read won’t (at least for anything more than line terminators). It seems clear that read was always intended to be case-insensitive.

My interpreters have always done the case conversion first, and it wasn’t until long after my first one that I realized it could be handled differently. I’ve not looked, but I’m curious how the case conversion is handled in other interpreters, especially since the exact zscii values are story dependent (and although unlikely, perhaps missing entirely!)

Do they do a double lookup like this:

Check the unicode char is a valid zscii extra char
Convert that to lower case
Check the lower case value is a valid zscii extra character
Use this second zscii value

I’m doing it the other way: https://github.com/borg323/jzip/blob/master/input.c#L333

Edit: looks like frotz converts to lower case first: https://gitlab.com/DavidGriffith/frotz/-/blob/master/src/common/input.c#L210

What happens if an uppercase letter is listed as a terminating character, and typed? Is it returned in uppercase or lowercase?

What happens if a lowercase letter is listed as terminating, and the corresponding uppercase letter is typed?

Related thread.

The standard says only function key codes can be in the terminating characters table, which would remove any ambiguity regarding terminating characters (maybe this is why), but see the other recent thread which points out that at least one Infocom game contains non-function key codes in the terminating character table (although it is a slash, which still doesn’t present a case problem). :thinking:

I think I’m going to stick with lowering the case first, then doing zscii conversion. It doesn’t require a double lookup, it is what frotz does, and it removes any problems with the terminating characters table as well. Presuming we allow arbitary characters here in contravaention of the standard, then upper case terminating characters, just like upper case dictionary entries, would never be matched.

1 Like

I’ll probably do it this way as well. The code is undergoing a major revision for better unicode support, this may help simplify it a bit.

That being said, my current code is a bit different to your first case (I forgot about it and I just noticed it re-reading the code): If the corresponding lower case character is not in the extra characters table, then the upper case character is not converted.

New rant:

It often bugs me how under-defined the z-machine stack is.

Clearly one can craft a story file that uses insane amounts of stack space. At what point is it no longer a valid story? 64KB? 128KB? More? I hate nebulous boundaries. Even fixing a value doesn’t really solve the problem because:

Since there are really two stacks (frame and eval), the answer can of course also vary depending on whether or not both use the same pool of memory, which is used more heavily, and even how many locals are used per frame.

This has always bugged me.

End of rant.

1 Like

Here’s one:
Using zero as the second operand (the parent) in a jin (IN?) instruction.

Legal or not?

1 Like

Since each object’s short name is prefixed by a length byte, that implies a zero length name is possible. It’s doubtful that many interpreters deal with this possibility.

Edit: Frotz seems to handle this just fine. Maybe others do and this is just something I hadn’t considered before. Oops, please disregard.

I’m not sure if this is ambiguous wording in the standard or just me being thickheaded, but do mouse clicks outside of a read_mouse/read/read_char still update the coordinates in the header extension table? 10.3.2 would seem to say yes, but to me 10.3.3 says no. I’ve always implemented it as ‘no’, but it’s not like there are a lot of mouse driven games to test on.

10.3.2

Whenever a mouse click takes place (and provided the header extension table exists and contains at least 2 words) the interpreter should update the coordinates as follows:

Word 1: x coordinate where click took place
Word 2: y coordinate where click took place

10.3.3

The mouse is presumed to have between 0 and 16 buttons. The state of these buttons can be read by the read_mouse opcode in Version 6. Otherwise, mouse clicks are treated as keyboard input codes (see below).

Speaking from a position of complete ignorance, I would imagine that the “right” behaviour is to update it immediately prior to delivering each input code representing a click, such that if the mouse is clicked multiple times before a read starts then the coordinates of each individual click can be read correctly provided that the input codes are read one at a time.

Another possible implementation is to completely ignore mouse clicks whenever there is not a read-in-progress, or to only deliver the latest one, but that seems more user-hostile to me.

By contrast, the read_mouse opcode returns a more “live” state, which requires a more complex realtime input loop but additionally supports tracking hovers and drags (which the standard does say should be possible).

Ignoring read_mouse, I’m not sure how useful updating the click coordinates outside of a read would be. You couldn’t be sure a click actually happened just by looking at the coordinates unless you also record the number of clicks.

I also don’t capture and buffer keystrokes made outside a read or read_char.

The actual coordinate update is not outside of a read. The mouse click itself might be.

The surrounding OS typically has some kind of event queue containing clicks and keypresses. These may occur asynchronously to whatever the VM is doing. The VM’s @read opcode is one of the ways these events get funneled into the VM, but that’s also potentially asynchronous (even though @read is blocking).

A player might type the word hello, but the VM running the story might happen to be reading a single character at a time and then doing some other processing before reading the next character. But it would still expect to receive the characters h e l l o in that order, without missing any just because the player happened to press the e key at an instant that the VM was doing something other than blocking on @read. Mouse clicks should be treated exactly the same way.

Granted, @read usually reads a line at a time, but it can be interrupted by a timer, and the keypress might happen during the intervening time. Or the story might be using @read_char instead, or specify a limited number of characters to @read.

So, in theory, a @read_char would block until there’s a mouse click, then write the coordinates and return the “mouse click” keycode. While some other processing happens, the user clicks again and this is put into some internal queue (or left in the OS’s input queue or something) without updating the coordinates, then when @read_char is issued again the internal queue is popped and it immediately updates the coordinates and returns the “click” key again. The second click shouldn’t have been discarded even though it technically occurred when no read is in progress – that will just frustrate the user. And in all cases it should see the coordinates of the click that it’s processing, not any future coordinates for a click it doesn’t know about yet. Keypresses “outside” of @read_char should work the same way.

@read would be similar, it’s just that there are fewer reasons you’d see intermediate clicks. Say the user clicks three times in three separate places before the timer interrupts the read, then the VM would receive three click codes (unless perhaps one or more are converted to a double-click code) but only the coordinates of the final click, since the first two were overwritten before the story saw them.

Again, this is just my non-terp-writing opinion, so you can take it with a grain of salt (and it wouldn’t surprise me if other terps had simpler behaviour); but it’s what seems to make sense to me, based on my understanding of usability and of the Z-Machine Standard.

1 Like

If you are capturing and buffering all key presses outside of a read, then it certainly would make sense to capture clicks as well. I have never implemented it that way though, and have never seen any frustrating behavior from it. I typically clear any remaining OS buffered input when a read starts. Time between reads is typically short enough that unless you know a game well enough to anticipate commands ahead of time and are playing on a particularly slow machine (or perhaps just mashing the keyboard), the lack of key capture between reads is not really noticeable. It also prevents an auto-repeated enter key from spam-finishing multiple reads.

It is from my perspective of not buffering in between reads that the language of the standard with regard to updating the coordinates seems ambiguous.

Here’s another: Do the coordinates get updated if, during a read, mouse-clicks are not in the terminating characters table? I would say no, since they should be ignored just like any other invalid input in that case.

True, with locally-executing code it can be hard to notice the effect at human-scale if you’re spending 99% of the time blocked in a read and only 1% of the time doing timer processing. But that still means that 1% of the time the player’s key/mouse input would get lost.

Construct a story that has a different balance of processing time (even if less realistic) to magnify the effect, and it becomes more obvious.

To use an analogy, I encountered a web app a while back where every individual keystroke in a form field is round-tripped to the server and then overwrites the text in the field. (Essentially, it was doing server-side validation.) When the app is running on the local machine and relatively idle, everything functions perfectly.

Load the local machine up with other processing work, though, or put it on a remote server instead (so you have network latency), and the cracks start showing. The user will type two characters in the time it takes to roundtrip one, and then their second character disappears. While not identical to the IFVM case, this is a logical extension of that kind of behaviour.

If mouse-clicks are in the terminating characters table, then that solves the “multiple clicks in one read” problem neatly. If they’re not, then things get tricky. Logically I would expect that it would update the coords for the last click made during the read and report the number of clicks (anywhere) as individual characters, as I said before.

This does break the dictionary parsing, but the raw text is still reported to the story, so it could detect the parsing failure, walk through the characters itself, strip out the clicks, then resubmit it for dictionary parsing (or for continued input, if the player didn’t press enter yet). It’s probably not the best way to handle mouse input (not least because there’s inherent latency when not interrupting the read on click) but it should theoretically work.


Perhaps another slightly more realistic case might be a story that wants fully real-time input, and so always uses @read_char and never @read. Perhaps it’s doing the parsing entirely itself, or perhaps it’s not even parser-based at all (although then probably the Z-Machine is not the right format to use).

Such a story would be alternating between sitting in @read_char waiting for input, doing some input processing and updating the world state because the player just pressed a key, or doing some background processing to update the state due to a timeout (perhaps some background animation, or it’s a real-time game and it has to update the enemy’s actions while the player isn’t doing anything).

The first two cases are obviously not problems – the player is very unlikely to press a second key in the time it takes to process the first key. Even fast typists just aren’t that fast. The third case, however, is where the problem occurs. You cannot possibly guarantee that the player won’t just happen to press a key an instant after the @read_char exits due to timer and before it starts the next one. The faster the processing time is, the less likely this is to occur, of course, but it can never be zero probability.

And this means a loss of input. In a parser game, the player might shrug it off and just hit the key again. In a real-time game, that might be the difference between making an attack or jump and not, and the player will swear loudly at the game’s poor input performance and possibly ragequit.

1 Like

Mouse clicks can not go into the input buffer of a read instruction. Only characters defined for both input and output can be there. If a click is not the terminating character, then it is ignored and the game has no way of knowing that it even happened, same as if you pressed a function key or any other input-only key that isn’t a terminating character.

Of course read_char can read any key so that does work.

1 Like

Ok, that simplifies @read a bit (although it seems a bit arbitrary, given that the transcript output is supposed to be able to include clicks). But if a click does happen to be in the terminating chars table, and it’s using @read with timer, then it can have the same problem as @read_char, so you still probably shouldn’t be throwing away input of keys or clicks just because there’s no read in progress right that instant.