Dialog Wishlist

All right, best-of-both-worlds version implemented! The Unicode casing table is now an array of four-byte blocks. Byte 1 of each block is the lowercase ZSCII character; byte 2 is 0 if the uppercase version is ZSCII and 1 if it’s Unicode; bytes 3 and 4 are the uppercase character in the appropriate encoding.

This does consume a few more bytes of addressable memory, but bytes are cheap compared to ZSCII character slots. That’s usually not the limit that Dialog games run into.

Output when compiling the new_unicasing test case:

Compiler output

Debug: Adding Unicode character U+0105 (ą) at ZSCII codepoint 224
Debug: Adding Unicode character U+20ac (€) at ZSCII codepoint 225
Debug: Uppercase equivalent for U+00e4 is U+00c4 (ZSCII 158)
Debug: Uppercase equivalent for U+00f6 is U+00d6 (ZSCII 159)
Debug: Uppercase equivalent for U+00fc is U+00dc (ZSCII 160)
Debug: Uppercase equivalent for U+00eb is U+00cb (ZSCII 167)
Debug: Uppercase equivalent for U+00ef is U+00cf (ZSCII 168)
Debug: Uppercase equivalent for U+00ff is U+0178
Debug: Uppercase equivalent for U+00e1 is U+00c1 (ZSCII 175)
Debug: Uppercase equivalent for U+00e9 is U+00c9 (ZSCII 176)
Debug: Uppercase equivalent for U+00ed is U+00cd (ZSCII 177)
Debug: Uppercase equivalent for U+00f3 is U+00d3 (ZSCII 178)
Debug: Uppercase equivalent for U+00fa is U+00da (ZSCII 179)
Debug: Uppercase equivalent for U+00fd is U+00dd (ZSCII 180)
Debug: Uppercase equivalent for U+00e0 is U+00c0 (ZSCII 186)
Debug: Uppercase equivalent for U+00e8 is U+00c8 (ZSCII 187)
Debug: Uppercase equivalent for U+00ec is U+00cc (ZSCII 188)
Debug: Uppercase equivalent for U+00f2 is U+00d2 (ZSCII 189)
Debug: Uppercase equivalent for U+00f9 is U+00d9 (ZSCII 190)
Debug: Uppercase equivalent for U+00e2 is U+00c2 (ZSCII 196)
Debug: Uppercase equivalent for U+00ea is U+00ca (ZSCII 197)
Debug: Uppercase equivalent for U+00ee is U+00ce (ZSCII 198)
Debug: Uppercase equivalent for U+00f4 is U+00d4 (ZSCII 199)
Debug: Uppercase equivalent for U+00fb is U+00db (ZSCII 200)
Debug: Uppercase equivalent for U+00e5 is U+00c5 (ZSCII 202)
Debug: Uppercase equivalent for U+00f8 is U+00d8 (ZSCII 204)
Debug: Uppercase equivalent for U+00e3 is U+00c3 (ZSCII 208)
Debug: Uppercase equivalent for U+00f1 is U+00d1 (ZSCII 209)
Debug: Uppercase equivalent for U+00f5 is U+00d5 (ZSCII 210)
Debug: Uppercase equivalent for U+00e6 is U+00c6 (ZSCII 212)
Debug: Uppercase equivalent for U+00e7 is U+00c7 (ZSCII 214)
Debug: Uppercase equivalent for U+00fe is U+00de (ZSCII 217)
Debug: Uppercase equivalent for U+00f0 is U+00d0 (ZSCII 218)
Debug: Uppercase equivalent for U+0153 is U+0152 (ZSCII 221)
Debug: Uppercase equivalent for U+0105 is U+0104

Only the characters ą and are added to ZSCII, not Ą or Ÿ. Those two are notably missing.

1 Like

All right! I think I’m going to call an end to this sprint here. (For me personally, that is. Everyone else is welcome to keep developing at whatever pace they like.) I’ve implemented one final feature request, which I’m sure all the typographers in the audience have been clamoring for: a (nbsp) predicate to produce non-breaking spaces!

This was mostly added to support French, where it’s a standard part of everyday typography, but you can also use it for things like 100 (nbsp) km or section (nbsp) 5 if you want. Works on Z-machine, Å-machine, and debugger, and is properly integrated with all the other spacing predicates. And, as a bonus, I documented how all those spacing predicates interact with each other.

(Also, non-breaking spaces are now treated as whitespace in source files, so you won’t get caught off-guard by a random NBSP in the middle of your identifier. This was my real goal, but French developers asked that I not remove their ability to print literal NBSPs until there was a better way built into the language.)

Once all the current PRs are reviewed and merged, that should clear out 7 of the 28 current issues, putting us in a nice place for a 1b/01 release. Plus, the test suite has gotten a lot more thorough for the future.

Now, back to thesis writing!

4 Likes

How soon is 1b/01? I should contribute a manual chapter on testing. (Or part of one, if we want to cover more ways to test than unit testing.)

(And ideally, I’d like to finish (quit $) as well for it.)

1 Like

Definitely not immediately! Finish those things on whatever schedule works for you; we just released 1a/01 a couple weeks ago, so 1b/01 doesn’t need to happen for a long while yet.

I’ve been doing a huge frenzy of development this week because I needed to specifically not think about my thesis for a few days, and this gave me a good outlet. But now that I’m done with this sprint, I’m expecting everything to go at a much more leisurely pace, where everyone can review the pull requests, look over the tests and manual updates, and so on. I absolutely don’t expect everyone on the project to be matching that pace—even just personally, I don’t think I could keep it up for long!

Here’s a fun one! When compiling to the Z-machine, Dialog is very efficient in its use of RAM, but less efficient in its use of ROM. Strings and routines take up a lot of space.

So, I’ve improved that. For technical reasons, The Wise-Woman’s Dog wasn’t used to generate the abbreviations (it’s too big and crashes ZAbbrev). So it makes a good test case for how much these abbreviations help!

Building with the current compiler (1b/01-dev, main branch):

Debug: Heap: 2500 words
Debug: Auxiliary heap: 500 words
Debug: Long-term heap: 750 words
Debug: Registers used: 118 of 240 (49%)
Debug:                 61 internal, 16 temp, 41 global
Debug: Properties used: 0 of 63 (0%)
Debug: Dynamic flags used: 7 of 48 (14%)
Debug: Total flags used: 43 native
Debug: Objects used: 340 of 8190 (4%)
Debug: Dictionary words used: 1435 of 7679 (18%)
Debug: Extended ZSCII characters used: 76 of 97 (78%)
Debug: Addressable memory used: 25582 of 65536 bytes (39%)
Debug:         Object table:     7612
Debug:         Object vars:      1360
Debug:         Unicode data:      161
Debug:         Wordmaps:            0
Debug:         Dictionary:       8207
Debug:         Main heap:        5000
Debug:         Auxiliary heap:   1000
Debug:         Long-term heap:   1500
Debug: Total filesize used:    505512 of 524288 bytes (96%)
Debug:         Routines:       283384
Debug:          Strings:       196544

Building with my updated compiler (1b/01-dev, z-abbrev branch):

Debug: Heap: 2500 words
Debug: Auxiliary heap: 500 words
Debug: Long-term heap: 750 words
Debug: Registers used: 118 of 240 (49%)
Debug:                 61 internal, 16 temp, 41 global
Debug: Properties used: 0 of 63 (0%)
Debug: Dynamic flags used: 7 of 48 (14%)
Debug: Total flags used: 43 native
Debug: Objects used: 340 of 8190 (4%)
Debug: Dictionary words used: 1435 of 7679 (18%)
Debug: Extended ZSCII characters used: 76 of 97 (78%)
Debug: Addressable memory used: 25888 of 65536 bytes (39%)
Debug:         Object table:     7558
Debug:         Object vars:      1360
Debug:         Unicode data:      161
Debug:         Abbreviations:     554
Debug:         Wordmaps:            0
Debug:         Dictionary:       8207
Debug:         Main heap:        5000
Debug:         Auxiliary heap:   1000
Debug:         Long-term heap:   1500
Debug: Total filesize used:    471040 of 524288 bytes (89%)
Debug:         Routines:       283368
Debug:          Strings:       161784

As you can see, abbreviations take up a bit of extra space in addressable memory (554 bytes), but more than make up for it in saving total filesize (34,760 bytes). With this improvement, The Wise-Woman’s Dog goes from 96% of the Z-machine limit to 89%, giving it a bit of extra room to breathe. Now there’s no risk of new bugfixes or the like hitting the limits!

5 Likes

Also, big shoutout to @heasm66, because without unz and ZAbbrev this wouldn’t have been possible. Those tools were used to generate the data files that make this optimization work. Also shoutout to @mathbrush and @improvmonster for letting me feed their code into unz and ZAbbrev to make those files.

Next step: alphabet optimization!

2 Likes

Alphabet optimization shaves off a little bit more:

Debug: Heap: 2500 words
Debug: Auxiliary heap: 500 words
Debug: Long-term heap: 750 words
Debug: Registers used: 118 of 240 (49%)
Debug:                 61 internal, 16 temp, 41 global
Debug: Properties used: 0 of 63 (0%)
Debug: Dynamic flags used: 7 of 48 (14%)
Debug: Total flags used: 43 native
Debug: Objects used: 340 of 8190 (4%)
Debug: Dictionary words used: 1435 of 7679 (18%)
Debug: Extended ZSCII characters used: 76 of 97 (78%)
Debug: Addressable memory used: 25968 of 65536 bytes (39%)
Debug:         Object table:     7584
Debug:         Object vars:      1360
Debug:         Unicode data:      161
Debug:         Abbreviations:     614
Debug:         Wordmaps:            0
Debug:         Dictionary:       8201
Debug:         Main heap:        5000
Debug:         Auxiliary heap:   1000
Debug:         Long-term heap:   1500
Debug: Total filesize used:    470760 of 524288 bytes (89%)
Debug:         Routines:       283408
Debug:          Strings:       161384

Not as much of an improvement this time: 400 bytes saved, 60 bytes spent. I’m not convinced this one was worth the effort, honestly, but it’s trivial to change back: just undefine the macro CUSTOM_ALPHABET and it’ll go back to its old behavior.

2 Likes

I think the problem may be with the custom alphabet I selected: it optimizes characters like . and ,, but the abbreviations already handle those in the most common cases. A better custom alphabet might be more useful here.

But I requested a PR review from heasm66, who’s the expert in these things, and he can fix it if so. Otherwise I may just change back to the default alphabet for simplicity. The machinery is all there to use a new alphabet in the future if it would be useful.

2 Likes

I’m hardly an expert (on anything) but I’ll gladly review it.

1 Like