Dialog Wishlist

No C compiler in common use today has much in common with the very first versions from the 70s. Even C compilers that implement the same version of the ISO standard have a large common subset but also significant differences in implementation-defined (i.e., not standardized) behavior and features not described in the ISO standard at all, which many real world C programs rely on. The Linux kernel was only intended to be compiled with GCC from the start, no other compiler. Eventually support was added for also compiling it with Clang, but this took many person-years of effort by the developers of Clang to implement all the non-standard GCC features Linux relies on.

The Dialog situation is not nearly as difficult because there is much less Dialog code than C code in the world. But still, a new compiler should ideally handle all existing and WIP Dialog games the same as the existing compiler, right? To achieve that, it has to match the existing compiler not just in the cases that you’re thinking of when you look at your Dialog code, it also has to handle all the weird things everyone else has done or will do in Dialog. There’s going to be dark, under-documented corners that someone has stumbled into (possibly without being aware of it), not just in syntax and data types but also in the low-level details of how programs are executed. For logic programming in particular, the execution details can make a huge difference (finishing instantly vs. looping forever or overflowing the call stack). Finding out these dark corners and matching what the existing compiler does is a challenge that doesn’t exist when you implement your own, new language.

I’m not saying it’s impossible for a single person to create a wholly new Dialog compiler that’s completely compatible. Certainly much easier than the Clang/GCC situation mentioned above. But I would expect that anyone who sets out to do it will still have to learn a fair amount about the existing compiler in the process, beyond what they would know from using Dialog themselves. That’s why I’m saying it’s not necessarily easier/quicker than learning to read the existing compiler’s source code.

4 Likes

Now, that said, the existing compiler has two phases.

First, it compiles to an intermediate representation that’s fairly close to Å-machine code.

Then, it compiles that intermediate representation into either Z-code or Å-code, or executes it directly (in the debugger).

Basic Glulx support wouldn’t be too difficult (just tedious), since we’d only need to reimplement the second part, and we have the Z-machine implementation to use as a template. There’s not a lot of magic going on here; for the most part, it has a pre-written Z-code routine for each IR instruction, and just calls those with appropriate parameters.

But, basic Glulx support wouldn’t improve the situation very much. The IR assumes a 16-bit word size, which puts a hard limit on how much memory can be addressed. I don’t know the proper term for it, but Dialog is designed so that all values are unambiguous: you can look at a 16-bit word and know immediately if it’s a number, a pointer, a character, or something else. That’s why there’s such a painfully tight limit on heap size: the 216 possible words have to be divided between heap addresses, integers, and everything else.

Proper Glulx support would involve increasing the word size, which on the face of it wouldn’t be too awful, but would have repercussions all throughout the compiler code. And that would mean someone needs to actually understand how the compiler works, which I don’t think anyone except Linus does.

In other words, someone would have to work through the dozens of files of dense, uncommented, undocumented code until they properly grok how it all works, and that’s the big barrier that’s stopping changes like negative numbers, more than 128 input characters, a larger heap, and so on.

9 Likes

ahh. i get it. like i said, i figured i was oversimplifying something about which i know nothing. which i hate when people do to me…

re: new name. i like ‘dialogic’ or ‘diabolic’

1 Like

Just in case anyone does end up going down this rabbit hole, I believe the term for this is “pointer tagging”.

3 Likes

Hmm. Having experimented a bit more…I don’t think it would actually be too difficult to expand the pointer tagging system to a 32-bit VM.

All the code that checks the pointer tags does it either with arithmetic comparison or bitwise operators. If we just put those tags at the top of the word—that is, compare against E0000000 instead of just E000—then the architecture remains basically the same, while we get 216 times as much space for everything.

And the Å-machine actually does have a field in the header to specify the word size; it’s just never been expanded beyond 2 before. Which means going to 32-bit words isn’t necessarily undermining Linus’s intent, either.

5 Likes

For names, I second Dialogic, Diabolic, and Recurse.

2 Likes

Linus is uncommonly gifted. Then again, there are many gifted programmers / writers on Intfiction.

Okay, I think this should be doable! Not easy, and probably not by me alone (in any short period of time), but doable.

The files that would need rewriting (or rather copying and then rewriting) are:

  • zcode.h, the struct representing a Z-machine instruction and the list of opcodes. This one shouldn’t be that hard to assemble (pun not intended) from the Glulx spec.
  • runtime_z.c, the raw Z-code routines implementing various IR instructions. These routines are copied verbatim into the output, so they’d just need to be rewritten by someone familiar with both Z-machine and Glulx assembly. (I probably could, but it would take ages.)
  • backend_z.c, the actual machinery for compiling IR into Z-code. This takes the IR routines, objects, global variables, and dictionary words, and turns them into a Z-code file in memory, before saving it to disk. Since Glulx puts very few constraints on how RAM is organized, this could be translated fairly verbatim, by just aping the Z-machine layout as closely as possible. However, it would again need someone very familiar with the architecture.

Once I have a bit of free time, I’ll put the existing code into a public repo so people can experiment with it. I’m going to go through and comment the bits of this process that I understand.

7 Likes

Oh, I suppose—once I have a bit of free time and a consensus has emerged on what to call it!

1 Like

Here’s a first stage poll

3 Likes

Closed this after a couple weeks!

Looks like this came out pretty firmly on the side of “small changes only”, which means I personally feel like rebranding the project would be overkill and possibly net-negative. The glulx backend does feel a bit bigger than “tweak”, but even that pretty in line with the existing project. But also I feel increasingly alone in this opinion… so I’m totally happy to go along with community opinion or, frankly, the whim of whoever’s inspired to start doing the work.

Also, wondering: has anyone tried getting in touch with @lft off of the forum? If not, I’d be happy to send an email over the break and see what comes of it.

To the original question: top of my wishlist is now fixing this weird thing:

(intro)
	Welcome to Dialog!
	(try [look])

(current player #player)
(#player is #in #room)

#room
(room *)
(name *)	Room
(look *)
    ([$a $a] = $b)
    $b

This code just unifies a variable with a two element list and prints the list. Can you guess what it prints?

(For your sake, I hope not! It’s not ideal!)

Edit: replaced a borogove snippet, which was having some trouble with file ordering I think.

2 Likes

I can’t work out what the result of printing a list containing two unbound variables should be, which makes me slightly less bothered if it turns out to do something weird (I can’t easily test it on my phone).

But I assume this is a minimal reproduction of something weird which you’ve actually run into in practice in a more complex example?

I think each unbound variable turns into a dollar sign.

So I’m guessing that snippet displays [$ $]

That would make sense. I’ve now tried it, and as @bkirwi says, the actual result makes no sense. In particular, it does not consist of a list containing two copies of the same value .

Actual output

[$ #room]

2 Likes

Now that is bizarre. And unfortunately deep enough magic that I have no confidence in my ability to fix it.

Yeah, that’s it exactly: I figured there were a few possible reasonable behaviours for Dialog given this snippet and was curious which one happened in practice. This one was not on my list!

From a little experimentation, it seems like this is consistent across backends, but small changes to the file can cause different object ids to magically appear. Perhaps a bug in the translation of list unification to lower level operations?

I have seen this behavior many times. Printing out unbound variables sometimes result in garbage output in the form of a weird object name, usually it is the #player object. It is kind of similar to dereferencing an unmaintained/invalid pointer in C.

Seems like printing an unbound variable has a side-effect of binding something to it, but what gets bound is fairly random (like the first object).

Hm. Isn’t printing an unbound variable supposed to be well-defined behavior (it just prints $)?

1 Like