I’ve been working on my zmachine ‘not an interpreter’, i.e. a reusable library that a front-end can utilize to run zcode stories. I thought it might be nice to expose the main dictionary to the front-end to enable quality-of-life features like noun/verb lists, clickable commands, or hints. This was fairly simple to implement, at least naively. Afterward I remembered that there can exist dictionary words that can’t be decoded and printed so easily. This is because a word can end with an incomplete multi-z-character construction. Simply outputting the truncated the word will actually work for some cases, namely those that end with a shift to alphabet 2 (shift-lock 2 in zmachine versions 1 and 2) which happens to be equal to the padding character and is thus ambiguous. But this will not work for all possible cases, e.g. an incomplete escaped zscii character. While this should be exceedingly rare, it seems like something a library api should try to tackle. Eliminating these words from the dictionary list feels like a cop-out, but displaying them in a way a front-end could make use of is awkward due to the dictionary truncation having lost information, namely the actual final character that was intended. Any thoughts?
Unfortunately, this isn’t really a solvable problem with how the Z-machine is constructed. Even if the game expects you to refer to SUPERHERO, SUPERHEROINE, and SUPERHEROES in different contexts, the dictionary only has one entry for all three of those.
I’d say removing the broken final character and displaying the rest is the best you can do.
There’s an extra wrinkle beyond the SUPERHEROINE case. Say you have dict words 'abcdef$' and 'abcdef{'. The parser can distinguish these in player input because they are different dict words – even after truncation. But you can’t recover the original dict words for printing. (Or autocomplete or what have you.)
I suppose you could guess. You wouldn’t know whether that last one is 'abcdef{', 'abcdef|', or 'abcdef}'. But if you printed one of those, then the player could type it and it would be recognized!
I don’t recommend that, though. The guessing algorithm is going to be a giant pain. Stick with what Draconis said.
I suppose I could output the characters that are there along with some indicator that the word is incomplete. That would at least allow a front-end to deal with it as it pleases: guess, ignore, or whatever. I try not to make too many assumptions about what the front-end might want to do.
There are a number of slightly different scenarios that can give rise to this problem. zarf illustrated one where the final z-character of an escaped zscii is missing, leading to a dictionary word which will match anything with the same leading characters followed by an escaped character with the same top three bits as the missing character. You could also have two missing z-characters from an escape, leading to matching any escaped character at all. You could also end with a missing character from alphabet 1. It is unlikely due to input being lowercased but it could occur with a custom alphabet table. Zmachine versions 1 and 2 also have shift and shift-lock variants of the issue. All of these would be legal dictionary words. Dealing with invalid words, such as might contain illegal zscii characters or abbreviation signalling z-characters is another matter entirely.