Details of the Abbreviate command?

So I was looking at the source code for “Search for the Lost Ark”, one of the latest ParserComp winners. Naturally, it was written in Inform 6. I noticed in the beginning of the file, there are a lot of Abbreviate commands. For example:

Abbreviate "freshly-turned earth";
Abbreviate "Ark of the Covenant";
Abbreviate "Father Alucard";

What does this actually do? I can’t seem to find anything online or in the forums, and the DM4 index only has one entry that leads to a page that mentions Abbreviate in passing but gives no detail. I have gathered that it has something to do with memory management, but that is all.

Can anyone give any details? Is it something I can safely ignore in my own games, or do I ignore it at my peril? The curious mind wants to know…

1 Like

Abbreviations are a feature of the Z-machine; there are 96 special escape sequences built into the string-encoding system, and when the string-decoder sees one of these, it looks up an entry in a table and prints the string at that address. The Abbreviate directive specifies that a particular piece of text should be put into that table, and any time it appears in a static string, the Inform compiler will replace it with the appropriate escape sequence.

In other words, Abbreviate "freshly-turned earth"; means that “freshly-turned earth” now takes only two characters instead of 22 every time it appears in a string.

The Inform compiler can choose a good set of abbreviations for you, if you pass a certain flag, so the Abbreviate direction usually isn’t used manually any more. But a clever human can still often do better than the compiler can, so if you’re targetting retro machines where ROM is scarce, you might write your own. (Nowadays, ROM usually isn’t the limiting factor of the Z-machine, object slots are. But retro machines have to worry about fitting it onto a disk.)

5 Likes

Ah, I see. Very nice explanation, thank you! For the game in question, I suppose it was particularly useful for the z3 format release.

It has been a long time since I’ve even considered worrying about memory management and file size. I’ll keep this in mind, though, especially if I write anything I might try to squeeze into a z3.

There is a description of how to use abbreviations in the PunyInform Game Author’s Guide (the PunyInform library was used to write “Search for the Lost Ark”): https://github.com/johanberntsson/PunyInform/blob/master/documentation/guides/game-authors-guide.md#use-abbreviations-properly

You can find out the best abbreviations with Henrik Åsman’s tool, for example: GitHub - heasm66/ZAbbrevMaker

2 Likes

I didn’t even know “Ark” was written in PunyInform, which I haven’t really looked at yet. I noticed some other conventions used in the game source were recommended at the site you gave, like constants for repeated strings.

As for the abbreviation tool, I will definitely consider using it for any smaller games I make. Thanks!

1 Like

It’s actually described in a footnote in DM4 §45. The Index just fails to mention this.

If story file memory does become short, a standard mechanism can save about 8­-10% of the total memory, though it will not greatly affect readable memory extent. Inform does not usually trouble with this economy measure, since there’s very seldom any need, and it makes the compiler run about 10% slower. What you need to do is define abbreviations and then run the compiler in its “economy” mode (using the switch -e). For instance, the directive

Abbreviate " the ";

(placed before any text appears) will cause the string “ the ” to be internally stored as a single ‘letter’, saving memory every time it occurs (about 2,500 times in ‘Curses’, for instance). You can have up to 64 abbreviations. When choosing abbreviations, avoid proper nouns and instead pick on short combinations of a space and common two- or three-letter blocks. Good choices include " the ", "The ", ", ", " and ", “you”, " a ", "ing “, " to”. You can even get Inform to work out by itself what a good stock of abbreviations would be, by setting the -u switch: but be warned, this makes the compiler run about 29,000% slower.

1 Like

Thanks, Zarf! I haven’t gotten that far in the DM4 (I somewhat recently bought a physical copy) and it didn’t show up in web searches. That being said, I admit I haven’t been giving much attention to most talk of memory management, since I haven’t specifically planned to write anything for retro systems. So I don’t know if it would have jumped out at me during the first read until I started looking at others’ source code anyway.

There are some older IF projects that I wrote in another language a few years back that I have considered converting to I6 for the practice, and they would definitely be candidates for PunyInform. Still, they’re decidedly tiny, and I’m not sure abbreviations would even be necessary except on the most constrained systems. But it might give me something to do as I work my way up to things I plan for a wider release. I doubt I’ll release any reworked older games I wrote, but who knows?

Though note that the limit is now 96 instead of 64! The Z-machine allows for 96 abbreviations, which Inform has traditionally divided into 64 “abbreviations” and 32 “dynamic strings”; as the compiler has continued to evolve post-DM4, it now lets you decide for yourself how many you want to allocate to each category. If you aren’t using dynamic strings (which most people aren’t), this means 96 abbreviations available.

The abbreviate process is somewhat cumbersome, so don’t bother about it unless you want to reduce file size and don’t do it until you are ready to publish.

In essence, compile using -r $TRANSCRIPT_FORMAT=1. This will create a text file called gametext.txt with all your strings. Feed this into ZAbbevMaker. This will create a file of abbreviations called abbrevs.h. Copy and paste these into the start of your inf file.

You also need !% $MAX_ABBREVS=96 at the start of the file to get the maximum 96 abbreviations and define the PunyInform Constant CUSTOM_ABBREVIATIONS; to prevent PunyInform doing its own abbreviations.

Finally, compile using the -e option. You can use this all the time. It doesn’t cause any harm if you don’t have any abbreviations.

Given that 8-bit authors are always trying to save a few bytes here and there, Abbreviate saves a couple of kilobytes, so it’s worth the effort if targeting retro machines. I don’t go to extremes to save every last byte. Even so, the combination of Inform 6, PunyInform, Abbreviate and some sensible coding creates tiny files that can be played on 8-bit and 16-bit computers, thus expanding the potential number of players. One of these days, I may actually get around to creating disk images for this audience.

In the meantime, small Z-code files also has the advantage of creating small base-64 files for use in Parchment, thus reducing download times. All my games are now playable online.

To elaborate on this point: starting with the Infocom days, authors (and even some abbreviation-finding algorithms, like Inform’s) have tended to assume that the most useful abbreviations would be long repeated strings like “Lord Nittlewaters”, or at least whole words. Instead, you get a lot more bang for your buck by abbreviating shorter but more-frequently-used word fragments like "ing " or “ould”.

For the interested, here are the results (in ZILF format) of running about 4 MB of text from various classic Z-code games through Henrik Åsman’s abbreviation-finder, the current benchmark-winner. In my tests, I found this dictionary was close to optimal for games with a reasonably large amount of text, although generating a game-specific dictionary may save another kilobyte or so.

Summary
        .FSTR FSTR?1,"You can't "         ;  539x, saved 5378
        .FSTR FSTR?2," doesn't "          ;  342x, saved 2724
        .FSTR FSTR?3," in the "           ; 1426x, saved 8547
        .FSTR FSTR?4," at the "           ;  612x, saved 3663
        .FSTR FSTR?5," on the "           ; 1051x, saved 6297
        .FSTR FSTR?6," of the "           ; 2375x, saved 14241
        .FSTR FSTR?7," through"           ;  720x, saved 4311
        .FSTR FSTR?8,"to the "            ; 2060x, saved 10291
        .FSTR FSTR?9," little"            ;  398x, saved 1981
        .FSTR FSTR?10," which "           ;  635x, saved 3166
        .FSTR FSTR?11," your "            ; 2490x, saved 9954
        .FSTR FSTR?12," about"            ;  721x, saved 2878
        .FSTR FSTR?13,", but "            ; 1261x, saved 6296
        .FSTR FSTR?14," is a "            ;  872x, saved 3482
        .FSTR FSTR?15," that "            ; 1708x, saved 6826
        .FSTR FSTR?16,", and "            ; 1476x, saved 7371
        .FSTR FSTR?17,", you "            ;  783x, saved 3906
        .FSTR FSTR?18,"There"             ; 1104x, saved 4410
        .FSTR FSTR?19,"ould "             ; 1085x, saved 3249
        .FSTR FSTR?20," some"             ; 1264x, saved 3786
        .FSTR FSTR?21,"have "             ; 1243x, saved 3723
        .FSTR FSTR?22,"It's "             ;  680x, saved 3391
        .FSTR FSTR?23," down"             ;  777x, saved 2325
        .FSTR FSTR?24,"thing"             ; 2034x, saved 6096
        .FSTR FSTR?25," are "             ; 1453x, saved 4353
        .FSTR FSTR?26," the "             ; 10856x, saved 32562
        .FSTR FSTR?27," from"             ; 1659x, saved 4971
        .FSTR FSTR?28," like"             ;  708x, saved 2118
        .FSTR FSTR?29," with"             ; 2283x, saved 6843
        .FSTR FSTR?30,"Your "             ;  434x, saved 1730
        .FSTR FSTR?31," you "             ; 3533x, saved 10593
        .FSTR FSTR?32," and "             ; 2954x, saved 8856
        .FSTR FSTR?33," to "              ; 5919x, saved 11832
        .FSTR FSTR?34,"The "              ; 7005x, saved 21009
        .FSTR FSTR?35," for"              ; 1709x, saved 3412
        .FSTR FSTR?36," you"              ; 2622x, saved 5238
        .FSTR FSTR?37,"ound"              ; 1487x, saved 2968
        .FSTR FSTR?38,"You "              ; 4850x, saved 14544
        .FSTR FSTR?39," out"              ; 1316x, saved 2626
        .FSTR FSTR?40,"ing "              ; 4897x, saved 9788
        .FSTR FSTR?41,"not "              ; 1464x, saved 2922
        .FSTR FSTR?42," is "              ; 4457x, saved 8908
        .FSTR FSTR?43,"tion"              ; 1669x, saved 3332
        .FSTR FSTR?44,"ough"              ; 1076x, saved 2146
        .FSTR FSTR?45," of "              ; 4520x, saved 9034
        .FSTR FSTR?46,"n't "              ; 2044x, saved 6126
        .FSTR FSTR?47,"ight"              ; 2038x, saved 4070
        .FSTR FSTR?48,"here"              ; 2168x, saved 4330
        .FSTR FSTR?49,"his "              ; 1855x, saved 3704
        .FSTR FSTR?50,"You'"              ;  670x, saved 2674
        .FSTR FSTR?51,"side"              ; 1049x, saved 2092
        .FSTR FSTR?52,"look"              ; 1208x, saved 2410
        .FSTR FSTR?53,"door"              ; 1205x, saved 2404
        .FSTR FSTR?54,"ard"               ; 1561x, saved 1558
        .FSTR FSTR?55,"'s "               ; 2335x, saved 4664
        .FSTR FSTR?56,". """              ;  964x, saved 2886
        .FSTR FSTR?57,"ain"               ; 1884x, saved 1881
        .FSTR FSTR?58,"see"               ; 1863x, saved 1860
        .FSTR FSTR?59," a "               ; 5046x, saved 5043
        .FSTR FSTR?60," in"               ; 4612x, saved 4609
        .FSTR FSTR?61," be"               ; 2476x, saved 2473
        .FSTR FSTR?62," st"               ; 2439x, saved 2436
        .FSTR FSTR?63,". I"               ;  876x, saved 2622
        .FSTR FSTR?64,"re "               ; 1800x, saved 1797
        .FSTR FSTR?65," th"               ; 2666x, saved 2663
        .FSTR FSTR?66,"hat"               ; 2270x, saved 2267
        .FSTR FSTR?67,"way"               ; 1628x, saved 1625
        .FSTR FSTR?68,"ess"               ; 1602x, saved 1599
        .FSTR FSTR?69,"one"               ; 1885x, saved 1882
        .FSTR FSTR?70,"ack"               ; 1536x, saved 1533
        .FSTR FSTR?71," on"               ; 1864x, saved 1861
        .FSTR FSTR?72,"en "               ; 2012x, saved 2009
        .FSTR FSTR?73,"ing"               ; 4987x, saved 4984
        .FSTR FSTR?74,"the"               ; 3080x, saved 3077
        .FSTR FSTR?75,"es "               ; 2373x, saved 2370
        .FSTR FSTR?76,"ent"               ; 2603x, saved 2600
        .FSTR FSTR?77,"and"               ; 2842x, saved 2839
        .FSTR FSTR?78,"an "               ; 2397x, saved 2394
        .FSTR FSTR?79," it"               ; 2912x, saved 2909
        .FSTR FSTR?80,"ear"               ; 1959x, saved 1956
        .FSTR FSTR?81,"ver"               ; 2265x, saved 2262
        .FSTR FSTR?82,"rea"               ; 2358x, saved 2355
        .FSTR FSTR?83,"all"               ; 3370x, saved 3367
        .FSTR FSTR?84,"ter"               ; 2457x, saved 2454
        .FSTR FSTR?85,"..."               ;  598x, saved 2386
        .FSTR FSTR?86,"ed "               ; 3256x, saved 3253
        .FSTR FSTR?87,"er "               ; 3487x, saved 3484
        .FSTR FSTR?88,"ly "               ; 3606x, saved 3603
        .FSTR FSTR?89,"st "               ; 1924x, saved 1921
        .FSTR FSTR?90,". "                ; 5144x, saved 5141
        .FSTR FSTR?91,", "                ; 8405x, saved 8402
        .FSTR FSTR?92,"e."                ; 1512x, saved 1509
        .FSTR FSTR?93,"--"                ;  765x, saved 1524
        .FSTR FSTR?94,"I "                ; 1526x, saved 1523
        .FSTR FSTR?95,"A "                ;    0x, saved -3
        .FSTR FSTR?96,";"                 ;  761x, saved 1516
WORDS::
        FSTR?1
        FSTR?2
        FSTR?3
        FSTR?4
        FSTR?5
        FSTR?6
        FSTR?7
        FSTR?8
        FSTR?9
        FSTR?10
        FSTR?11
        FSTR?12
        FSTR?13
        FSTR?14
        FSTR?15
        FSTR?16
        FSTR?17
        FSTR?18
        FSTR?19
        FSTR?20
        FSTR?21
        FSTR?22
        FSTR?23
        FSTR?24
        FSTR?25
        FSTR?26
        FSTR?27
        FSTR?28
        FSTR?29
        FSTR?30
        FSTR?31
        FSTR?32
        FSTR?33
        FSTR?34
        FSTR?35
        FSTR?36
        FSTR?37
        FSTR?38
        FSTR?39
        FSTR?40
        FSTR?41
        FSTR?42
        FSTR?43
        FSTR?44
        FSTR?45
        FSTR?46
        FSTR?47
        FSTR?48
        FSTR?49
        FSTR?50
        FSTR?51
        FSTR?52
        FSTR?53
        FSTR?54
        FSTR?55
        FSTR?56
        FSTR?57
        FSTR?58
        FSTR?59
        FSTR?60
        FSTR?61
        FSTR?62
        FSTR?63
        FSTR?64
        FSTR?65
        FSTR?66
        FSTR?67
        FSTR?68
        FSTR?69
        FSTR?70
        FSTR?71
        FSTR?72
        FSTR?73
        FSTR?74
        FSTR?75
        FSTR?76
        FSTR?77
        FSTR?78
        FSTR?79
        FSTR?80
        FSTR?81
        FSTR?82
        FSTR?83
        FSTR?84
        FSTR?85
        FSTR?86
        FSTR?87
        FSTR?88
        FSTR?89
        FSTR?90
        FSTR?91
        FSTR?92
        FSTR?93
        FSTR?94
        FSTR?95
        FSTR?96

        .ENDI

Incidentally, the design of ZSCII is one of my only persistent peeves with the otherwise nearly-flawless design of the Z-machine. The compression approach (based on packing three five-bit characters into two eight-bit bytes of data) is just not very good, and adds a lot of complexity for how little space it actually saves. Glulx uses Huffman coding, which is better, but there’s a blindingly superior option available for the type of 8-bit platforms the Z-machine originally targeted: just encode the text with a 256-entry dictionary consisting of the most common English character sequences, the most common individual characters, and a codepoint for “read another byte and look it up in a second dictionary of less-common individual characters”. This is how algorithms like smaz work, and it can easily achieve ~50% text compression with near-instantaneous decoding. /rant

Prev, personally I wonder why the 6 bit charset (Fieldata, early DEC PDP’s) wasn’t considered (you can pack 4 character in 3 bits) was never considered, considering that there’s not only a bit wasted in the baudot-derived 5-bit code, but also its constant shifting between letters and figures/symbols (this is why there was the need of counting the frequency of Z-code abbreviation,because was under a shift then shift back to letters, the binary representation of the abbreviation being actually multi-zcode character long…)

Best regards from Italy,
dott. Piergiorgio.

Not wasted, necessarily. The extra bit is used for the terminator so that doesn’t take an extra character.

Whether this is efficient or not depends how long your strings tend to be.

For comparison, I tried compressing the high-strings portion of Curses! using a variety of schemes (bpc = bits per character):

ASCII - 8.0 bpc
Fieldata - 6.0 bpc
ZSCII w/no abbreviations - 5.7 bpc
ZSCII w/word abbreviations - 5.1 bpc
ZSCII w/optimal abbreviations - 4.6 bpc
Huffman - 4.6 bpc
smaz - 4.5 bpc
LZSA2 - 3.6 bpc
DEFLATE - 3.2 bpc

ZSCII isn’t actually that bad when used correctly, but that’s mostly down to the compression dictionary, so why not dispense with all the bit-twiddling and just use a compression dictionary (like smaz)?

Anyway, a moot point after 44 years, but maybe useful to homebrewers.

2 Likes