Z-Machine undefined behavior

Ah, got it. Thanks.

I’m going to try to stay out of this discussion (I’ve got enough VMs on my plate!) but please let me know when it’s appropriate to update the spec pages at http://inform-fiction.org/zmachine/standards/index.html .

As I’ve said in some other thread, I’m also thinking about setting up an IF spec site on the IF Archive somewhere. Z-machine spec documents would be duplicated there.

2 Likes

After considering the file length in the header I’m thinking the lower number should be the limit and a string or routine that begins on the last possible packed address (for all versions except 6 and 7, which vary because of the offsets) could never be valid because it would be only one byte long. The file sizes listed in 1.1.4 could be made more specific.

Edit: Argh! Wrong, wrong, wrong. Even knowing there is an issue here, I couldn’t get it right.

To illustrate: In V8 the max file length is 8 * 65535 = 524280. But this means the valid addresses are 0 to 524279. The maximum packed address is completely invalid as it would represent 524280.

1 Like

There are a couple of things I noted down while working on an interpreter.

  1. In all versions but 6, startup occurs outside of the context of a function. Several opcodes just assume they’re in a function, though (all the returns, catch, throw, and check_arg_count, at least). At the very least I’d say interpreters should be advised to detect this and halt with an error message.
  2. The behavior if text is written beyond the edge of the upper window: wrap, or cut off/extend beyond the edge? Interpreters differ here. For example, Lectrote (so probably ifvms.js) and Unix Frotz wrap the text; Windows Frotz and Fizmo cut it off. An interesting test case is My Angel.

As a clarification on Lectrote, it wraps when writing to the upper window, but appears to scroll when reading input (e.g. My Angel).

Quick code to test writing:

[Main i;
  @split_window 10;
  @set_window 1;
  for(i = 0: i < 200: i++) print(i % 10);
  @set_window 0;
  @read_char 1 -> i;  
];

In section 8 of the standard it says that scrolling is never applied to the upper window and in the remarks it suggests that wrapping and buffering in the upper window is incorrect. Edit: A reading of Infocom’s ezip spec says that window 1 never scrolls and printing beyond the right-hand margin is not displayed. Perhaps the clipping of text in the upper window could be made more explicit in the standard though, instead of being in the remarks only.

Returning from the starting routine in V6 or the ‘functionless’ context of all other versions is in fact illegal according to the standard (sections 5.4 and 5.5). Catch and check_arg_count should just return zero, while throwing to the ‘main’ routine via “throw 0” would then also return from that routine because of the behavior of throw (section 15, description of throw opcode) and so is also illegal, which agrees with the standard.

1 Like

It would also be nice if unit tests could be created for any of these issues. I’ll very happily receive pull requests for Praxix for non-UI/interactive stuff, though we should probably make a new one for UI/interactive stuff. Clipping would be an easy thing to test, just print “should not be visible” after a screen’s-width of spaces.

On wrapping/clipping, section 8 says only the following:

In Versions 3 to 5, text buffering is never active in the upper window (even if a game begins printing there without having turned it off).

Some ports of ITF apply buffering (i.e. word-wrapping) and scrolling to the upper window, with unfortunate consequences. This is why the standard Inform status line is one character short of the width of the screen.

I read this as saying that the problem is with word wrapping more than character wrapping, though you’re probably right that clipping is the best option of all.

@cas Yeah I don’t think it would be possible with Glk/HTML to have input wrap.

1 Like

I was looking over Infocom’s xzip spec and noticed that CATCH and THROW were never intended to work from within interrupts. That would take care of #9 if it were added to the standard.

I remembered this one:

When accepting unicode (extra characters) as line input, it is ambiugous whether or not the conversion to zscii should happen before or after input is lower-cased.

This gives us two scenarios -

Before case change: Requires both upper and lower case versions of the unicode character to exist in the table if you want to allow case insensitive input. Possible weirdness if any are missing.

After case change: Only lower case letters needed in extra character table (unless upper case is needed for output). This means an upper case unicode character will always be valid input if the lower case character is. Maybe this is what the author wants, maybe not.

My understanding is that unicode characters that are not in the extra character table are not valid for input. If this is correct, then only the first case is conforming.

I know of nothing in the standard that would dictate one case over the other. It all depends on when the transition to lower case is made.

In the first case, a story file with non-matched upper and lower case characters in the extra characters table could lead to case-sensitive input in a way that can’t happen with ascii characters.

Agreed, but this case sensitivity is already in the standard for read_char.

My thinking was that check_unicode returns bit 1 set for a character “if and only if the interpreter can receive it from the keyboard” which I took as a requirement for the character to be in the extra characters table. Thinking about it, this is probably more strict than what is mandated.

Of course read_char is a different beast anyway, accepting characters that read won’t (at least for anything more than line terminators). It seems clear that read was always intended to be case-insensitive.

My interpreters have always done the case conversion first, and it wasn’t until long after my first one that I realized it could be handled differently. I’ve not looked, but I’m curious how the case conversion is handled in other interpreters, especially since the exact zscii values are story dependent (and although unlikely, perhaps missing entirely!)

Do they do a double lookup like this:

Check the unicode char is a valid zscii extra char
Convert that to lower case
Check the lower case value is a valid zscii extra character
Use this second zscii value

I’m doing it the other way: https://github.com/borg323/jzip/blob/master/input.c#L333

Edit: looks like frotz converts to lower case first: https://gitlab.com/DavidGriffith/frotz/-/blob/master/src/common/input.c#L210

What happens if an uppercase letter is listed as a terminating character, and typed? Is it returned in uppercase or lowercase?

What happens if a lowercase letter is listed as terminating, and the corresponding uppercase letter is typed?

Related thread.

The standard says only function key codes can be in the terminating characters table, which would remove any ambiguity regarding terminating characters (maybe this is why), but see the other recent thread which points out that at least one Infocom game contains non-function key codes in the terminating character table (although it is a slash, which still doesn’t present a case problem). :thinking:

I think I’m going to stick with lowering the case first, then doing zscii conversion. It doesn’t require a double lookup, it is what frotz does, and it removes any problems with the terminating characters table as well. Presuming we allow arbitary characters here in contravaention of the standard, then upper case terminating characters, just like upper case dictionary entries, would never be matched.

1 Like

I’ll probably do it this way as well. The code is undergoing a major revision for better unicode support, this may help simplify it a bit.

That being said, my current code is a bit different to your first case (I forgot about it and I just noticed it re-reading the code): If the corresponding lower case character is not in the extra characters table, then the upper case character is not converted.

New rant:

It often bugs me how under-defined the z-machine stack is.

Clearly one can craft a story file that uses insane amounts of stack space. At what point is it no longer a valid story? 64KB? 128KB? More? I hate nebulous boundaries. Even fixing a value doesn’t really solve the problem because:

Since there are really two stacks (frame and eval), the answer can of course also vary depending on whether or not both use the same pool of memory, which is used more heavily, and even how many locals are used per frame.

This has always bugged me.

End of rant.

1 Like

Here’s one:
Using zero as the second operand (the parent) in a jin (IN?) instruction.

Legal or not?

1 Like

Since each object’s short name is prefixed by a length byte, that implies a zero length name is possible. It’s doubtful that many interpreters deal with this possibility.

Edit: Frotz seems to handle this just fine. Maybe others do and this is just something I hadn’t considered before. Oops, please disregard.