ZAbbrevMaker 0.10 released

heasm66 · January 12, 2024, 9:43am

Version 0.10 of ZAbbrevMaker has arrived…

Up to two times faster than earlier version.
Visual Studio 2022 & Net8.0.
New feature that helps with identifying multiple occurences of identical strings or part of strings suitable for CONSTANT.
New feature that illustrates how the abbreviations are applied to the strings.
New feature that shows statistical information on where in memory strings are located and how much empty space that are lost to uneven alignment to bytes.
New feature that auto-detect between Inform6 or Zilf source code.
New feature that can generate abbreviations from output extracted from binaries with TXD and Infodump from ZTools .

Source and binaries for different platforms at GitHub - heasm66/ZAbbrevMaker

Binaries are also available at version_0.10 - Google Drive The binaries should be self-contained for each platform.

Please test and report errors and/or suggestions here or on the project at GitHub.

heasm66 · January 12, 2024, 1:05pm

I thought it could be interesting to see what is possible to do with optimization with the latest versions of Inform6 (in some examples I’m gonna use new features that currently only are available in pre-release form) with help of ZAbbrevMaker.

I’m gonna use (with some modifications) Dorm: Adventure at the 8-Bit Assembly by @Carrington as an example.

PunyInform also have a document, PunyInform Game Author’s Guide, with a couple of useful tips and tricks that I will use as reference.

Baseline - with debug code and commands

If you compile with the -D switch (or define the constant DEBUG inside the code) we get a max size with all extra code and error checks that is available.

./inform -D
--> z5, size = 168.584 bytes

Turn off DEBUG

./inform
--> z5, size = 165.860 bytes

Turn off strict error checking

./inform -~S
--> z5, size = 146.000 bytes

Omit unused routines

./inform -~S $OMIT_UNUSED_ROUTINES=1
--> z5, size = 145.816 bytes

Compact dictionary

Every dictionary word in Z-code has four or six bytes to store the word string and then three bytes of data. The last of these data bytes hasn’t been used since ancient times in the standard library (grammar version 1) or in PunyInform and is always 0. This unused byte on each dictionary entry can be removed with the compiler switch $ZCODE_LESS_DICT_DATA=1.

./inform -~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1
--> z5, size = 144.896 bytes

Use predefinied generic abbreviations

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1
--> z5, size = 135.536 bytes

Use game specific abbreviations generated by Inform6

Applying these 64 generated abbreviations to the game and recompiling.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1
--> z5, size = 133.056 bytes

Generate full set of 96 abbreviations with Inform6

If we set the compiler switch $MAX_ABBREVS=96 Inform6 will generate a full set of abbreviations. Applying these 96 generated abbreviations to the game and recompiling.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96
--> z5, size = 130.992 bytes

Use all 96 abbreviations generated by ZAbbrevMaker

ZAbbrevMaker is a tool to generate a more optimal set of abbreviations and the z-machine standard allows up to 96 abbreviations. Specifying another set of abbreviations instead of the standard 64 is done with the compiler switch $MAX_ABBREVS=96. Applying these 96 generated abbreviations to the game and recompiling.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96
--> z5, size = 126.472 bytes

The game is now so small that we can switch to the prefered version z3.

./inform -v3 -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96
--> z3, size = 121.756 bytes

Use a optimized alphabet with help of ZAbbrevMaker

The normal alphabet used is optimized to give the lowest cost for the letters a-z. If you actually count the frequency of character in the text often characters like j, q, x or z are used less often than comma, full stop or T. If we use ZAbbrevMaker with the switch -a we get the following alphabet for this game:

! Custom-made alphabet. Insert at beginning of game.
Zcharacter
    "abcdefghi.klmnop,rstuvw'yT"
    "ABCDEFGHIJKLMNOPzRSjUVWxYZ"
    "012q456789*>!?_<]/[-:()";

The new alphabet should be inserted as early as possible because it is applied from the insertion point and forward. If we now recompile with this alphabet and a new set of recalculated abbreviations.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96
--> z3, size = 120.634 bytes

Beware that you test that the interpreter on the platform you’re aiming for is able to use a custom alphabet. According to the z-machine standards a custom alphabet is only valid for version 5 and later. Even though i might wotk on modern interpreters it is not certain that older or ones for retro platforms will work.

Remove SYMBOL TABLE from compiled game

This feature is not yet available but is coming in version 6.42 of Inform6.

Inform compiles in the names of the symbols in a table. These names are used to give better and more informative error messages. Hopefully this is not necessary in the final released version of the game and can be removed with the
compiler switch $OMIT_SYMBOL_TABLE=1.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96 $OMIT_SYMBOL_TABLE=1
--> z3, size = 118.620 bytes

Move text from high strings to inline text in code.

This feature is not yet available but is coming in version 6.42 of Inform6.

Inform have have a cut-off length of 32 characters when to store the string in high strings area instead of inline in the z-code. The opcode for printing inline text takes less space and inline strings waste less memory for versions 4 onward due to packed addressing for high strings. The cut-off length can be modified with the compiler switch $ZCODE_MAX_INLINE_STRING.

Beware that very long inline strings could lead Inform to construct jumps larger than 8192, which are a “branch out of range” compile error.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96 $OMIT_SYMBOL_TABLE=1 $ZCODE_MAX_INLINE_STRING=9999
--> z3, size = 116.284 bytes

Use string constants

ZAbbrevMaker has a switch, --onlyrefactor, to generate a list of all strings or part of string that appears multiple times in the text. For example this game produces this (extract):

Long repeated strings:
  3x106 z-chars (~ 134 bytes), ( end  ) ". Of course, if someone brought me a tasty treat I might be inclined to do them a favour in return.~"
  2x106 z-chars (~  71 bytes), ( full ) " out of your hands, then briefly considers it before dropping it dismissively. ~Is that all ye got?~"
  2x 99 z-chars (~  66 bytes), ( full ) "The safety railing cannot be traversed. It wouldn't be much of a ~safety~ railing otherwise."
  2x 94 z-chars (~  53 bytes), (mixed ) ", although it does feel like if you pushed it then it would return to its original position."
...

If we replace the string ". Of course, if someone brought me a tasty treat I might be inclined to do them a favour in return.~" with a constant, regenerate abbreviations and recompile, we get and saves 134 bytes.

./inform -e~S $OMIT_UNUSED_ROUTINES=1 $ZCODE_LESS_DICT_DATA=1 $MAX_ABBREVS=96 $OMIT_SYMBOL_TABLE=1 $ZCODE_MAX_INLINE_STRING=9999
--> z3, size = 116.150 bytes

There are a lot of bytes to hunt down this way but at the expense of making your code a bit less readable.

zarf · January 12, 2024, 2:38pm

You really should do a test with the abbrevation list generated by inform -u $MAX_ABBREVS=96. That would let you directly compare Inform’s 96-list with ZAbbrevMaker’s 96-list.

heasm66 · January 12, 2024, 2:43pm

But I really want to show how good my application is!

(Jokes aside, that’s a fair critque. I’ll add it.)

On another subject, is there a release of 6.42 in the cards sometime soon?

EDIT: Tagging @DavidK

zarf · January 12, 2024, 2:46pm

The release schedule is on David Kinder’s plate.

Marvin · January 12, 2024, 5:37pm

In fact, custom alphabets are a v5+ feature, so they shouldn’t work in any v3 interpreters.

heasm66 · January 12, 2024, 6:44pm

Aaah…

I’ll change that.

DavidK · January 13, 2024, 5:23pm

There isn’t a current plan to release Inform 6.42 yet. There isn’t anything in the next version that anyone is desperate for, is there?

Marvin · January 15, 2024, 3:18am

Inform 6 requires abbreviations to fit in a maximum of 64 bytes, and ZAbbrevMaker can produce abbreviations far larger than this.

I discovered this testing SpiritWrak, which reuses identical long strings of text several times in its subway system code.

zarf · January 15, 2024, 5:19am

That is true. (The line is #define MAX_ABBREV_LENGTH 64.) I don’t think the question of that limit has come up before.

I guess the thinking was that the author would notice large shared chunks of text and turn them into string constants or routines. The abbreviation mechanism was (notionally) for finding little pieces of text that were too common or annoying to do that for.

Do you think it’s worth making that a dynamic allocation?

heasm66 · January 15, 2024, 8:21am

That’s correct. The sensible thing would be to refactor and convert them to string constants or routines.

On the other hand, what harm would it create for Inform6 to allow longer abbreviations? As far as I know there’s no restrictions to the length in the z-machine standards. One idea could be to allow longer abbreviations than MAX_ABBREV_LENGTH but issue a warning when an abbreviation exceeds it?

I’m gonna limit abbreviations to 64 characters in next version when producing for Inform6, but make it adjustable.

zarf · January 15, 2024, 3:51pm

I filed Clean up MAX_ABBREV_LENGTH · Issue #257 · DavidKinder/Inform6 · GitHub , although I’m not sure when I’ll get to it.

Draconis · January 15, 2024, 4:58pm

Like with a lot of the I6 compiler’s limits, it was designed to save memory on machines with very little RAM. It’s not a problem for the generated Z-code, and nowadays, I don’t think anyone would notice the difference if it was raised from 64 to 1024 or whatever. (But it has to be an error rather than a warning because, if I understand right, it would overflow an internal buffer in the compiler.)

zarf · January 15, 2024, 5:22pm

Changing the hard limit to 1024 would be easy. I’d rather change the internal buffer(s) to be dynamically allocated.

Marvin · January 16, 2024, 1:18am

Note that this is a 64 byte limit, not a 64 character limit.
Both of the below are too long

! the letter 'é' 23 times
Abbreviate "@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e@'e"; 
! the letter 'a' 64 times
Abbreviate "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

heasm66 · January 16, 2024, 5:35am

I havn’t tested myself yet but based on your example I would say that the limit is in z-chars.

zarf · January 16, 2024, 5:45am

Neither, I’m afraid. It’s source code bytes.

I have a prospective patch for removing the limit from the Abbreviate directive. It doesn’t fix the abbreviation generator, however.

Marvin · January 18, 2024, 3:32am

I understand how the current MAX_ABBREV_LENGTH is/was useful to keep memory usage down while running the compiler, but I’m not sure limiting abbreviations based on memory used while compiling is a useful metric for a user creating a game.

Some sort of (possibly adjustable) limit would be nice (in both Inform and ZAbbrevMaker), at least as a warning, because long repeated strings probably shouldn’t be abbreviations.

heasm66 · January 18, 2024, 7:45am

If there’re long repeated strings, they should definitely be converted to constants or refactored to routines. A warning in ZAbbrevMaker when one or more abbreviations exceed a limit is a fair compromise (I think a hard limit that generates an error is unnecessary, because long abbreviations are perfecly legal to use by the standards.)

zarf · January 18, 2024, 2:52pm

Oh, it has no value for users at all. I intend to remove the generator limit. Just not sure when I’ll get to it.

As for very long abbreviations: you could show a warning or just rely on the user to notice them in the list.