Z-Machine 1.2 Proposal

Marvin · March 1, 2014, 10:28am

I’m not completely opposed to the idea, it just seems weird to me to disallow certain types of text with certain types of print opcodes.

I may be misunderstanding the problem completely here (seems likely, actually), but shouldn’t it be possible to search ahead for the bytes that mean ‘Unicode Escape’? The word before Unicode starts should be $1BFF, $7FE5 or $7CA5. You might have to check back on the previous word to be certain, I guess?

The only complications here are that the ZSCII padding might not be 5s, which we can fix by specifying it must be 5s when padding before Unicode data, and the fact that we might conceivably want to set the stop bit on the word before the Unicode data to signal ‘the ZSCII text ends when the Unidode data does’, which we can fix by saying that you can’t set the stop bit on the word before the Unicode data (if you want to end the text after the Unicode, you set the stop bit on a sequence of padded 5s in the word after the Unicode).

This would obviously be more work that just looking for a word with the stop bit set, but I think it does allow you to find the end of a string of text without decoding everything.

Dannii · March 1, 2014, 11:40am

Because it’s a nightmare for authors!

Remember most of the code directly dealing with these streams will be in libraries/extensions. They are supposed to be modular, they shouldn’t have to know which other extensions are in use. They’ll have to either hardcode the list of streams to check, meaning they’ll need to be updated every time a new stream is proposed, or they’ll have to dynamically check. If they do that they can either be conservative and check identifiers 1-5 and miss number 6 or 23, or else potentially waste a lot of memory just in case every stream in existence is used at once. And they’ll have to do this every time a stream is opened! I really don’t understand why you’re so fixed on the idea that every stream except stream 3 should be shared.

I have a proposal for a completely re-conceptualised stream system, which slightly contradicts the earlier specs, but won’t have any behavioural issues. There would be a stack of streams, storing the stream ID, memory address if provided, a stop bit and an all-at-once bit. When you open a stream an entry is added to the stack. The terp will use the stream ID to determine the continue bit - it can ask plugins or an IO layer if it needs to, but the important thing is that it can do it at that time, rather than waiting until something is printed to it. Now when you print, the text is sent to the top most stream and any lower streams until the stop bit is set. When you close a stream it finds the top most stream of the correct ID, removes it from the stack, and processes it. Interpreters could flush streams with the all-at-once bit set whenever they like, though that would depend on the terp. (All streams should be flushed when @read is called.) If you want it to be completely compatible with previous specs, have two stacks, one just for stream 3, and one for all the others. In this proposal the handling of streams by the interpreter is just as general as with tagged stream 5s, but without needing to have a second kind of identifier - the stream ID would be enough. It also allows for layered streams to be handle very simply by both terps and authors/library authors - simply set the stop bit if your stream shouldn’t be duplicated below.

Marvin · March 1, 2014, 12:09pm

Dannii:

If you don’t want to share the data, you turn off the other streams. Why is that difficult?

Because it’s a nightmare for authors!

Remember most of the code directly dealing with these streams will be in libraries/extensions. They are supposed to be modular, they shouldn’t have to know which other extensions are in use. They’ll have to either hardcode the list of streams to check, meaning they’ll need to be updated every time a new stream is proposed, or they’ll have to dynamically check. If they do that they can either be conservative and check identifiers 1-5 and miss number 6 or 23, or else potentially waste a lot of memory just in case every stream in existence is used at once. And they’ll have to do this every time a stream is opened! I really don’t understand why you’re so fixed on the idea that every stream except stream 3 should be shared.

I’m not fixed on the idea that every stream should be shared. I think that being able to share this stream is useful behaviour, in order to avoid sending the same text out twice, so that two different streams get it.

Now, I’ll admit I was looking at this from the point of view of a game writer actually knowing what the streams are doing, rather than from the idea of many future potential streams being opened and shut by libraries written by different people at different times.

So, yes, checking for every single potential stream to see if it’s open would be something that a library should probably do to be future-proof, I guess. It seems to me that if any future streams share the data with other open streams, we have this problem.

Maybe a library should be responsible for checking the status of any streams with numbers below the one it’s working with? It should surely know what they’re for. Essentially, a library would be responsible for ensuring any lower streams are disabled if they need to be, and for leaving its own stream disabled when it is not currently sending information.

(I haven’t responded to the suggestion for completely altering the system because I need to read it a few more times to understand it)

Marvin · March 1, 2014, 1:52pm

Dannii:

I have a proposal for a completely re-conceptualised stream system, which slightly contradicts the earlier specs, but won’t have any behavioural issues. There would be a stack of streams, storing the stream ID, memory address if provided, a stop bit and an all-at-once bit. When you open a stream an entry is added to the stack. The terp will use the stream ID to determine the continue bit - it can ask plugins or an IO layer if it needs to, but the important thing is that it can do it at that time, rather than waiting until something is printed to it. Now when you print, the text is sent to the top most stream and any lower streams until the stop bit is set. When you close a stream it finds the top most stream of the correct ID, removes it from the stack, and processes it. Interpreters could flush streams with the all-at-once bit set whenever they like, though that would depend on the terp. (All streams should be flushed when @read is called.) If you want it to be completely compatible with previous specs, have two stacks, one just for stream 3, and one for all the others. In this proposal the handling of streams by the interpreter is just as general as with tagged stream 5s, but without needing to have a second kind of identifier - the stream ID would be enough. It also allows for layered streams to be handle very simply by both terps and authors/library authors - simply set the stop bit if your stream shouldn’t be duplicated below.

I think I understand this. Although ‘The terp will use the stream ID to determine the continue bit’ is confusing me. Continue bit?

Anyway. This seems to imply you can put multiples of any stream on the stack. What happens when multiple stream 1s are open?

Also, this seems to remove the whole point of (my idea of) stream 5, which is to generalise streams-that-talk-to-the-interpreter so that we don’t need to update the spec every time someone wants to add another bit of similar functionality.

Maybe I just don’t get it, but it seems to add complexity without solving much.

zarf · March 1, 2014, 7:13pm

I agree. But it’s also weird to mandate particular choices in the Z-text encoding algorithm (your suggestion) in order to make life easy for interpreters. It would be easy to slip or forget a case, and then the interpreter is back to doing full decoding just to disassemble the opcode.

Although… doesn’t it makes sense for the interpreter to do full decoding at that point anyhow? It’s going to need the decoded character sequence eventually. Maybe ZVM isn’t layered this way now, but conceptually a JIT-style interpreter would just plow through the string and get it over with.

Egon · March 1, 2014, 7:25pm

Very much yes.

Marvin · March 1, 2014, 7:28pm

True enough. One of the mandates needs to be defined one way or the other, though. Do we allow an end bit on the word before Unicode data? The interpreter needs to know if this is likely.

Dannii · March 1, 2014, 10:09pm

Oops, I said continue bit originally, then switched to stop bit. The terp will use the ID to determine whether the stream will share the data down the stack. It could have a list of streams and their settings, or it can ask code outside the core VM for that information.

Probably that should be a nop instead. Or it could be allowed, with any resulting chaos being the author’s fault?

It’s more generalised than introducing another level of stream identification, but you can’t get away with not specifying the new streams. The spec doesn’t have to be updated, it can say to look at another website which the streams manager will keep updated. If you want to put the gestalt stuff on another page you could do that too.

I used to do that, but I changed it to be compatible with asm.js which meant no strings in the generated code. If the consensus is not to restrict it I can work around it, but I thought it worth asking.

Marvin · March 1, 2014, 10:21pm

I really don’t want to start putting stream and gestalt definition in seperate specs. Those are things an interpreter needs to know if it wants to function correctly. If the functionality of those changes, the standard needs to change, with the same ‘submit an idea to the world and see if it flies’ process being used here.

Dannii · March 2, 2014, 12:08am

Yes, but you can also do what Zarf does and assign ranges to people for them to document more thoroughly elsewhere. If they use their range for experiments or for solid specs, so be it.

Though I don’t think it would be a problem for all stream IDs and gestalts to be specified elsewhere, perhaps on the IF Wiki. This approach works well for other things, such as HTTP rel attributes: microformats.org/wiki/existing-rel-values The IF Wiki is well moderated so that it won’t be able to be vandalised, and periodically a backup could be stored in the archive just in case something goes wrong with the wiki.

Marvin · March 2, 2014, 5:23pm

I’m trying to go over the text of my draft again and fix any remaining problems. I’ve already found a few issues, like sections in the wrong place and information missing, but I’m not entirely clear on what problems other people have with the current draft.

I have no intention of trying to push this thing through without addressing concerns people have, but I need to know what concerns remain, so we can at least try to come to a consensus.

The latest draft is still at frobnitz.co.uk/zmachine/1.2/draft4.html

Marvin · March 3, 2014, 8:25pm

I managed to completely miss this comment. I haven’t checked what Infocom’s interpreters do. I might, out of interest, but I’m not sure it matters. Infocom never made use of multiple possible interrupts in their games, so they might have just not thought about it.

Marvin · March 5, 2014, 1:48pm

I’m starting to integrate the new proposal into the full text of the Standard, and I’ve decided to add some paragraphs to the Remarks in the Output Stream section, to clarify intent and sensible use of Output Stream 5. The remarks I’ve written are:

The intended purpose of output stream 5 is to interact with existing systems outside of the
interpreter. An example of this is a web-based interpreter using Javascript to alter the web
page around the interpreter.

While it is possible to use output stream 5 as merely a second way to send instructions
to the interpreter, creating new functionality not currently in the Standard, this
approach is not recommended. Features that do not require the interpreter to interact with an
outside system would be better added in future versions of the Standard.

While the purpose of making output stream 5 generic is to allow future expansion of the available
data formats simple, and not require an update to this Standard, it is clearly counter-productive to
have identical identifiers created by different people for different purposes. To this end, a seperate
document will exist, so that game and interpreter writers may register their identifiers, and avoid
such problems. It is not absolutely required that you register your identifier, but be aware that this
can potentially cause serious problems. [Link to registry will go somewhere here].

Games making use of output stream 5, and output stream defined in future versions of this Standard, will
very likely make use of library code, rather than requiring the game author to handle low-level stream handling
manually. It is vital, then, to allow such libraries to ensure that text is not sent to an output stream that should
not receive it.

To this end, it is recommended that these libraries observe the following guidelines.

Check the state of any streams with numbers lower than the stream you wish to send output to. Turn any of
these streams you don’t wish to send text to off, and record the previous state so that you can turn them
back on when finished.

Turn on the stream you wish to send output to, send the output, and turn it back off. Do not leave the stream
open while giving control of the game back to the game author. This ensures that other code will not accidentally
send bad data to your stream, or turn a lower stream back on, leaving you to send bad data to it.

Set all the lower streams back to the state they were in before you began.

I’m entirely convinced I’ve missed something important. Comments would be appreciated.

Marvin · March 7, 2014, 7:01pm

While fiddling with some stuff about Unicode strings, I came across a problem with @print_table.

From the Standard:

As defined there, this opcode should work the same in window 0 and window 1. Common sense, however, says that window 0 should not allow for printing nice neat little boxes of text.

The ‘print an array of ZSCII’ is a useful opcode to have, even without the ‘nice neat little box of text’ functionality, so I’m proposing that we:

Update the 1.0 spec to say ‘print_table in window 0 is illegal and undefined’
Update the 1.2 spec to say ‘print_table in window 0 is allowed, but behaviour is undefined if the height is not 1’

Dannii · March 7, 2014, 10:33pm

That’s a pretty big change to make, and I’m not sure its justified. The user could turn on fixed width printing if they wanted to. Also, no more unspecified behaviour - if we’re changing the specs we have to decrease unspecified behaviour.

Marvin · March 7, 2014, 11:06pm

Which change? The change to 1.0 or the the change to 1.2? As far as I’m aware, the behaviour of print_table in window 0 is inconsistent across Standard interpreters. In Infocom’s own DOS interpreter print_table doesn’t work at all in window 0. I’m just trying to specify what (I believe) the current 1.0 interpreters actually do (unspecified), and what is sensible for 1.2 interpreters to do without redefining the behaviour of the lower window fairly significantly.

The bottom window is not designed for allowing a high level of control over the placing of characters. Are you saying that print_table should work exactly the same in both windows?

Dannii · March 7, 2014, 11:21pm

Hmm I guess if the infocom terps never supported it then maybe it should always have been limited to the upper window, and I realise I’ve implemented it incorrectly in the upper window anyway. I’ve just implemented it as printing lines of text separated by line breaks, not caring which column you start with. What do Frotz and Bocfel do? If you think the lower window behaviour should be part of 1.2 rather than 1.0, then okay (but remember all new things should get gestalts). What about doing what I did for the lower window?

Marvin · March 7, 2014, 11:31pm

Okay, I produced a test file, which print ‘YARG’ to an array using output stream 3, and then print out ‘fify’ to get the cursor away from the left margin. It then prints the array back using print_table, with ‘width’ set to 2 and ‘height’ set to 2.

In the upper window, this would result in:

fifyYA
    RG

In the lower window, we get

Windows Frotz:
fifyYA
    RG


WinNitfol:
fifyYA
RG

Parchment:
fify
YA
RG

According to zarf:

MacZoom:
fifyYARG

Fizmo-ncurses:
YA
RG

fify

I think we can safely say that in 1.0, behaviour is undefined, and nobody knows what behaviour in the lower window should be. What we want it to do in 1.2 is still somewhat up for grabs.

Marvin · March 7, 2014, 11:38pm

I’m not entirely against giving it a gestalt. I’m not entirely certain we need to. It shouldn’t be optional if you support 1.2. We can give it a gestalt and make it required, but given that the gestalt doesn’t even exist unless we already support 1.2, I’m not sure we need to add a gestalt for required behaviour. Nobody’s going to need to check the gestalt ever.

Marvin · March 9, 2014, 6:01pm

I plan to rewrite the description of print_table in 1.2 as follows:

print_table

VAR:254 1E 5 print_table zscii-text width height skip

In the upper window in Version 5, and every window in Version 6, print a rectangle of text on screen spreading right and down from the current
cursor position, of given width and height, from the table of ZSCII text given. (Height is optional and defaults to 1.) If a skip value is given,
then that many characters of text are skipped over in between each line and the next. (So one could make this display, for instance, a 2 by 3
window onto a giant 40 by 40 character graphics map.)

In the lower window in Version 5, instead print text from the current cursor position as normal, but insert a newline after every ‘width’
characters, except on the last line, so that each new line starts at the left margin of the window, and the cursor is placed to the right
of the last character printed. The skip value still indicates characters of text to skip, and text is wrapped at the right margin as normal.