What specifically would you do with a REPL? Both the Z-Machine and Glulx allow for mutable executable code, but the difficulty would be more the data structures.
I would essentially cut the iteration cycle into a fraction of the time. Being able to code the game from within the game itself gives you immediate feedback, which is useful because you’re not being continually distracted from the task at hand by the need to recompile, fix bugs, repeat the sequence leading up to the previous mement etc: the alternative would ideally simply let you introduce or alter objects or behavior as needed.
Lft, how does it do tail calls in the Z-Machine?
REPL, that’s why i was asking whether it would be possible to directly interpret the AST. ie the stage before emitting z-code (or other).
Obviously this would not be as efficient, but for development it would be a huge bonus. Also logic errors could be tracked back to source lines and so on, because you would still have the source context.
The first thing to notice is that queries to predicates cannot map directly to subroutine calls, because of backtracking. So instead of using the native stack, Dialog maintains a more complex data structure in a large array called the heap. A non-tail call involves storing an activation record on the heap, and then jumping to the entry point of the callee. Returning involves reading back the activation record, and jumping to a return address specified within. A tail call simply jumps without storing anything.
So what remains is a need to jump to arbitrary locations in the code.
In Z-code, the only way to jump to an arbitrary location (in the full 512 KB address space) is to make a subroutine call. Therefore, in order to jump to an arbitrary address, Dialog makes a subroutine call. This has the unfortunate side-effect of pushing an activation record onto the native stack. To prevent the stack from growing indefinitely and crashing the system, there needs to be one return for every call.
The trick, then, is to have an outer routine like this:
label: CALL_1S L00 -> L00 JUMP label
At first, L00 is the packed address of the program entry point. Whenever a part of the “actual” program code needs to jump somewhere, it returns the packed address of that code. The outer loop takes this address, puts it right back into L00, and loops. There’s some overhead, but we just have to live with that.
So that should hopefully answer your question. But here’s an additional twist: A very common thing for compiled Dialog code to do, is to trigger a failure condition (which is handled by backtracking). The Z-machine has a special, short opcode for returning zero (RFALSE), but more importantly, returning zero can be encoded as a special branch target. Thus, a very efficient way of representing the failure condition is with a zero return value. The outer loop is modified to handle this:
label: CALL_1S L00 -> L00 JNZ L00 label ...handle backtracking... JUMP label
Since JNZ has roughly the same overhead as JUMP, this turns out to be a very profitable optimization.
It’s been so long since I’ve looked at Prolog, and I forgot how weird it all is. Backtracking is part of that.
So if calls aren’t really calls except for tail calls, does that mean you don’t use locals for call parameters? Is everything in the heap, or do you make use of the stack too?
Glulx has real tail calls, but that might not actually help with how Dialog operates. I was thinking of offering to add tail calls to the 1.1 Z-Machine proposal, but it seems like what you’ve got works. But if you think it would help we can still look at that.
The bulk of the code doesn’t use local variables at all. Parameters and temporary values are kept in global registers.
Occasionally, the compiled code calls a bunch of hand-coded routines, the runtime layer, and these do use locals. Unification is handled by such a routine that’s recursive (when unifying lists), so the Z-machine stack does get used, just not directly from the compiled predicates.
If there were a far-jump-instruction, I suppose Dialog could make use of it (while still retaining the outer loop for the RFALSE trick). But it’s not useful enough to justify generating Z-code that would be incompatible with old interpreters.
That would be another advantage for Glulx then, as it has full 32 bit jumps.
The more I understand what Dialog is doing the more impressed I am with it! Well done.
Agreed - it’s almost like a shorthand for I7 which might be more palatable to programmer/coder types.
Dialog release 0c/01 (library 0.14) is available on the website.
In addition to minor improvements and bugfixes, the new version introduces three notable features:
Slash expressions are syntactic sugar for listing alternatives inside rule heads. This is very handy when dealing with synonyms in parser rules, but it’s also useful in story code. For instance, the following rule head:
(#redbook/#bluebook/#greenbook is #in #bookshelf)
is new, shorthand syntax for
(*($ is one of [#redbook #bluebook #greenbook]) is #in #bookshelf)
which (as before) is equivalent to
($X is #in #bookshelf) *($X is one of [#redbook #bluebook #greenbook])
Please refer to this new section of the manual for further details.
The standard library has been updated to make use of the new syntax.
Automatic stemming facilitates authoring in languages other than English. With a special rule definition, it is possible to list removable word endings. It might look like this, for German:
(removable word endings) en es em e s
When the player has typed in an unrecognized word, Dialog attempts to remove any matching word endings, starting with the shortest one, to see if that helps.
From now on, infinite loops must be implemented with multi-queries to a new built-in predicate, (repeat forever), as backends are no longer required to support tail-call optimization. The Z-machine backend still does, of course, but a future debugging backend might not.
Once again, I’m impressed. I love the slash expressions a lot because they reduce parser code to a minimum.
Thx for the great work!
This looks very cool!
Indeed. The Visual Studio Code extension for ZIL adds several IDE features, including a source-level debugger using ZLR as the backend. The debugger is fairly easy to adapt to other languages, as long as you can generate a debug info file mapping the Z-code addresses back to source lines.
Version 0c/02 (download link) fixes a compiler bug in ‘(status bar width $)’. The bug made scored games, including Cloak of Darkness, crash on some interpreters.
Parser tracing (feature activated by typing (trace on) somewhere in the story file already gives a very good feedback what’s going on. It traces all triggered rules and writes out the actual line of their code. This is far better than any other if debugger I’ve seen so far, because it spans all rules including parsing.
I implemented this for ZILF also, but I found it stopped being useful once games reached a certain size, because the extra code and text for tracing made them exceed the Z-machine memory limits.
Oh, and by the way (with apologies if this has already been addressed), what are your plans for the language further on? Is it gonna be exclusively z-Machine and/or text only?
For now, I’m going to focus on the Z-machine backend, text-only. The compiler needs some rework, and there are bugs to sort out.
Eventually I hope to add Glulx support, as well as the basic “display picture” stuff.
But I think it makes sense to at least start on the debugging backend before that, since that is going to reveal flaws in the frontend/backend interface, and those flaws are going to be less painful to address if there’s just one regular backend.
Is there a way to have inline assembly of some sort? If so then multimedia (and a whole lot of other stuff) can just be implemented in libraries. (And indeed, many of the the built-in predicates arguably should be too.)
On the contrary, I think it’s important not to go down that path. Inline assembly results in a creole of VM-dependent, low-level code and story-level predicates. Not only does that lead to platform lock-in effects, it’s also more difficult for the compiler to optimize, and less readable.
One of my explicit goals was to make a language where story authors could read, understand, and modify library code. I think many Inform 7 programmers hesitate to even look at the I6T code underpinning their stories, let alone modify it. And who can blame them, when those routines are written in a completely different language? I’d even posit that there are two castes of Inform 7 programmers: those who understand how the system works under the hood, and those who regard it as black magic, and consistently need to turn to the former group for advice. And this can be authors with long-term experience of Inform 7! Due to the mixture of languages at different levels of abstraction, there’s this huge threshold in the middle of the learning curve, pushing people back into the lower caste. And I emphatically want to avoid that.
I hasten to add that there are other aspects of Inform 7 that I think are brilliant. But this particular design choice has always bothered me.
Of course, Dialog also has inaccessible low-level stuff under the hood, but the point is that the low-level stuff doesn’t have anything to do with the story. Both Inform 7 and Dialog have the same overall stack-up of 1. story, 2. library and parser, and 3. low-level machinery. Inform 7 puts the language barrier between levels 1 and 2. Dialog puts the barrier between levels 2 and 3. I argue that the latter is better because the library and parser are so deeply intertwined with the story; The story author must be able to understand and adapt them, and to debug levels 1 and 2 together, at a high level of abstraction. To allow inline assembly is to allow the incomprehensible stuff to creep up through the levels.
So, in addition to the technical reasons I mentioned at the top, this boils down to a personal desire to let the author see what’s going on. And I’ve argued that the way to keep the relevant layers comprehensible and transparent is, perhaps counter-intuitively, to keep the low-level stuff hidden: To maintain a strict separation between high-level (story, library, parser) and low-level (z-code, optimization), and to ensure that this interface is clean and well documented.
Pulling the parser into the high level is great. But I don’t see why that has to mean the author can only access a lowest common denominator model of the VM. Especially when it’s usually very simple, far far simpler than the parser is.
For example, with text formatting, why have separate predicates for bold, italic, etc. If you don’t want inline assembly, what about predicates that correspond to the operations of the VM: a “set text style” predicate that takes a parameter for the style. (Rather than then implementing those rules manually in the compiler, I’d use assembly myself, but you do what you want.) The “status bar” predicate is the kind of magic I would’ve thought you wouldn’t want from what you just wrote. To set the height and clear the window, and reset the cursor… Why should that be one operation in the compiler rather than a library function which can be modified as needed? It’s hiding things from the author, and not just in library files, but in the compiler itself!