Dialog Wishlist

If we’re discussing changes to the standard library: I’m new to Dialog, but from what I can tell, it seems like it would be pretty trivial to add support for gender-neutral (they/them) pronouns, i.e. a (gender-neutral *) predicate analogous to the existing male and female?

The choice of what to include in the “standard library” sends a message about what is or isn’t considered, y’know, “standard,” and especially given the diversity of the IF community it feels unfortunate that Dialog doesn’t support gender-neutral language out-of-the-box, when it would be such an easy fix.

2 Likes

The trick with this is that singular “they” is a complicated beast, grammatically: you would say “Draconis is” but “they are”, so the is-are predicate needs to know somehow if a pronoun or a noun came before it.

I fully agree it would be nice if the library added it, but I’d need to think a bit about how to implement it. You can sort of get there with (plural $), but that would say *“Draconis are”. Probably the easiest way would be to have the subject pronoun predicates set a “pronominal context” flag for (gender-neutral *) objects, and the name-printing predicates clear that flag; the problem with that is that Dialog generally relies on passing the subject to all those predicates instead of setting flags like Inform does…an interesting puzzle!

2 Likes

Can we edit the first message to add a link to the repo (or make a new topic) so that you don’t have to dig in this topic to find it?

Also, what’s the procedure now to make a feature request? Write it here or open an issue in the repo?

2 Likes

I think either works; right now I’m the main one making pull requests, and I follow both this and the repo, but probably best to put specific issues on the repository, and general discussion here?

I have to wonder to what degree (animate $) could be gender-neutral; I don’t have the stdllib handy to see what it does in terms of articles when only (animate $) is known (without (male $) or (female $)).

Nothing at all. None of the name-printing code checks (animate $), only the parser and some of the action implementations (e.g. you can’t search an animate thing).

By default, something that’s animate but not male or female will be “it”, same as any other thing that’s not male or female.

2 Likes

I don’t think it’s too complicated, at least as far as messages produced by the standard library are concerned.

There’s already an (it $ is) predicate that could be set to “they are” or “they’re” (currently male/female results in “he’s”/“she’s”), and when I search through the stdlib for uses of (it $) I don’t see any messages generated by the standard library that would be difficult to fix. It looks like you’d have to introduce an (it $ doesn't) to use in one place that currently uses (it $) (doesn't $), and presumably you’d want to introduce an (it $ has) and (it $ does) as well for the end user (although neither (has $) nor (does $) is really used by the standard library), but I think that would basically cover everything. Otherwise, you just leave (is $), (has $), etc. in singular form as usual.

I mean, obviously this wouldn’t cover everything that an author might want to write, but neither does the standard library in its current state (e.g. there’s currently no built in check predicate for was/were).

EDIT: Biggest question is how to handle (s $) and (es $). If you really wanted to, you could introduce some sort of (they $ s) and (they $ es), with the caveat that this doesn’t actually print the “they,” so you’d have to write messages like (it $) grab (they $ s) the ball, which is a little awkward, but not the end of the world?
EDIT2: Although I guess it would actually be called (it $ s), for consistency.

3 Likes

4 posts were split to a new topic: New built-in for wordsplitting

I thnk it’s really time to start splitting some of these out into their own thread chains.

1 Like

So far we haven’t updated the library at all; I’ve mostly been focusing on the compiler, and hlship has been focusing on the documentation. So if you wanted to put together a pull request for this, it would be much appreciated! I’d love to have more people working on this—not least because it reduces the risk that I’ll break something and not notice, as I’ve done twice so far. >_>

As I try to figure out what the behavior of floating divs should be, I’ve been contemplating ways to make them more useful. And one of those ways would be to allow centering and right-justification, for things like quote boxes.

On the Z-machine side, this isn’t too difficult. Print the text to a buffer with output stream 3, look at the length of that buffer, then print the appropriate number of spaces before doing the text.

However, this has a problem. If the text is printed back from the buffer, it’ll lose all styling information. Previously that wasn’t a concern, but now I’ve enabled styling in the status bar! It would be a shame to lose that again. But if the text is printed a second time live, the contents might change, due to (select) statements and other randomized output or side effects.

So my proposal is: make this the library’s problem instead! The fundamental ethos of Dialog is that as much as possible should be done in library code, accessible to authors, instead of being deep magic in the compiler (or a separate lower level, like I6 within I7, or assembly within I6). Why don’t we do the same for this?

But Linus has also adamantly refused to add string-processing into the language, and I want to respect that. Adding an entirely new data type for text buffers is also a bigger change than I’m comfortable biting off right now.

So, specifically, I propose a new special syntax, something like (count characters) ... (into $), which measures the length of whatever is printed inside it (without sending any of it to the screen). The library can then use this, along with (space $), to do whatever centering and right-justification it likes. And because this is happening in library code, it’s fully transparent to authors: they can see firsthand why it’s important not to put (select) statements inside their centering, rather than just taking it on faith that it’s illegal for deep magic reasons.

Then, you could do something like:

(right justify $Closure)
    (current div width $Width)
    (count characters)
        (query $Closure)
    (into $Chars)
    ($Width minus $Chars into $Spaces)
    (space $Spaces)
    (query $Closure)

For centering, just divide $Spaces by two first. For a quote box, draw an appropriate number of dashes on either side. And so on!

What do you all think of this idea? For now, the character-counting would be Z-machine only, until I figure out how best to tinker with the Å-machine. But on the Å-machine web interpreter, you can already use text-justify: right instead.

(If I go with this proposal, I might as well add (count lines) ... (into $) and (count words) ... (into $) too, for maximum flexibility. It’s not that much harder to add all of those than to add just one. But counting characters is the easiest of those.)

2 Likes

Sure, I’ll try to put that together when I find the time.

2 Likes

Thanks! I’m not sure how long my current focus on this is going to hold out, so more people making pull requests is a good sign for the project’s future if I have to step back.

This bug is now fixed.

The first time a local variable appears in a routine, the compiler allocates some memory for it. The problem is, when it’s compiling a list literal like [$X $X], the list gets compiled from back to front: the last values are compiled first. (This comes down to how lists are represented as LISP-style cons cells. When compiling [$X $X], it first compiles [$X | [] ], and then compiles [$X | that thing].)

As a result, when the same unbound variable appears twice in a row in a list literal, and it hasn’t been used for anything before that list literal, it gets used before the memory is allocated, and the result is undefined! On the Z-machine, it accesses whatever happens to be sitting around in memory. In the debugger, it fails an assert and crashes.

For now, you can work around this in a few different ways. If you use the variable for anything before that, it works fine:

(do nothing $)
(program entry point)
    (do nothing $X)
    ([$X $X] = $Y)
    $Y

Or if you avoid having the same unbound variable twice in a row, that’s also fine:

(program entry point)
    ([$X 0 $X] = $Y)
    $Y

Or, I’ve just submitted a pull request that should fix this once and for all! It also fixes this bug, which has the same root cause.

5 Likes

The limits you were hitting seem to go back to the $8000 bug. I’ve added various new diagnostics about Z-machine limits now, and Miss Gosling, a decently-sized game, is nowhere near them.

Registers used: 113 of 240* (47%)
Properties used: 0 of 63 (0%)
Dynamic flags used: 9 of 48 (18%)
Total flags used: 48 native, 12 extended
Objects used: 174 of 8190 (2%)
Dictionary words used: 1069 of 7679 (13%)
Addressable memory used: 20308 of 65536 bytes (30%)
        Object data 1:    3788
        Object data 2:     348
        Wordmaps:        11186
        Main heap:        2000
        Auxiliary heap:   1000
        Long-term heap:   1000
Total filesize used:    291872 of 524288 bytes (55%)
        Routines:        54632
         Strings:        20320

These don’t correspond directly to Dialog constructions, but registers are broadly global variables and global flags (plus about a hundred used by the compiler for optimization purposes), properties are per-object variables, dynamic flags are per-object flags, and wordmaps are word-to-object mappings to optimize dictionary lookup.

Also, while registers are listed out of 240, that gets an asterisk because it’s not actually a hard limit. When the compiler runs out of registers, it stores extra variables in RAM. Less efficient to read and write, but it’s not going to doom the compilation.

I need to test this a bit more before release, but if you want to see what it says about Forsaken Denizen, you can pull from the z-diagnostics branch and try that. (Or you can send me the source or I can send you a Linux binary.) The main thing I’m taking away from this is: with the $8000 bug fixed, the Z-machine’s memory limits aren’t much of a constraint on Dialog games at all.

5 Likes

Just out of curiosity, have you done any similar examination of the Å-machine limits?

Not in as much depth, I’m afraid. I understand the Z-machine a lot better than the Å-machine, so I can understand e.g. why the dictionary needs to end before position 65535 (that’s the end of addressable memory) even when the source is uncommented, but the Å-machine architecture is more of a mystery.

As best I can tell, two of the limits are imposed by Dialog’s pointer tagging system (no more than $1FFE objects, no more than $1DFF dictionary words), a couple others are imposed by the Å-machine’s architecture (no more than 127 non-ASCII characters), but overall the memory model is much more lenient. It uses different memory maps for different things, so e.g. you can have 65535 bytes of dictionary data and 65535 bytes of object data, instead of needing to share the Z-machine’s 65535 bytes of addressable memory between them. It looks like the limit on overall file size is $FFFFFFFF, same as Glulx, which is a limit nobody’s ever hit before.

But for three of those limits (objects, dictionary words, non-ASCII characters), I can definitely add similar logging. I think those are the three that you can hit in practice—especially the third one, which came up when someone was trying to implement Japanese input.

I’ve added basic Å-machine diagnostics now:

Objects used: 174 of 8190 (2%)
Dictionary words used: 1072 of 7679 (13%)
Non-ASCII characters used: 6 of 128 (4%)

Would it be useful to also report how much of each chunk is used? 65536 bytes each for object data and dictionary data seems like a huge enough limit that it practically doesn’t matter, but it could be reassuring to see that “1% used”.

1 Like

It might help people to learn which things increase the Å-machine’s memory usage: for example, how adding per-object variables or flags uses much more memory than individual globals.

Oh, right, I should elaborate: the last released version already prints how much memory is used by each chunk, it just doesn’t show percentages or limits. That would be the change.

Chunk HEAD:       68 bytes,    0 kB
Chunk META:      745 bytes,    1 kB
Chunk LOOK:     1633 bytes,    2 kB
Chunk TAGS:     1803 bytes,    2 kB
Chunk LANG:      222 bytes,    0 kB
Chunk MAPS:     5081 bytes,    5 kB
Chunk DICT:     7682 bytes,    8 kB
Read-write memory: 3611 words (7222 bytes)
Chunk INIT:     3222 bytes,    3 kB
Chunk CODE:    77191 bytes,   75 kB
Chunk WRIT:    59505 bytes,   58 kB
Chunk URLS:      105 bytes,    0 kB
Chunk FILE:       13 bytes,    0 kB
Chunk FILE:       10 bytes,    0 kB