To elaborate on this point: starting with the Infocom days, authors (and even some abbreviation-finding algorithms, like Inform’s) have tended to assume that the most useful abbreviations would be long repeated strings like “Lord Nittlewaters”, or at least whole words. Instead, you get a lot more bang for your buck by abbreviating shorter but more-frequently-used word fragments like "ing " or “ould”.

For the interested, here are the results (in ZILF format) of running about 4 MB of text from various classic Z-code games through Henrik Åsman’s abbreviation-finder, the current benchmark-winner. In my tests, I found this dictionary was close to optimal for games with a reasonably large amount of text, although generating a game-specific dictionary may save another kilobyte or so.

Summary

.FSTR FSTR?1,"You can't " ; 539x, saved 5378
.FSTR FSTR?2," doesn't " ; 342x, saved 2724
.FSTR FSTR?3," in the " ; 1426x, saved 8547
.FSTR FSTR?4," at the " ; 612x, saved 3663
.FSTR FSTR?5," on the " ; 1051x, saved 6297
.FSTR FSTR?6," of the " ; 2375x, saved 14241
.FSTR FSTR?7," through" ; 720x, saved 4311
.FSTR FSTR?8,"to the " ; 2060x, saved 10291
.FSTR FSTR?9," little" ; 398x, saved 1981
.FSTR FSTR?10," which " ; 635x, saved 3166
.FSTR FSTR?11," your " ; 2490x, saved 9954
.FSTR FSTR?12," about" ; 721x, saved 2878
.FSTR FSTR?13,", but " ; 1261x, saved 6296
.FSTR FSTR?14," is a " ; 872x, saved 3482
.FSTR FSTR?15," that " ; 1708x, saved 6826
.FSTR FSTR?16,", and " ; 1476x, saved 7371
.FSTR FSTR?17,", you " ; 783x, saved 3906
.FSTR FSTR?18,"There" ; 1104x, saved 4410
.FSTR FSTR?19,"ould " ; 1085x, saved 3249
.FSTR FSTR?20," some" ; 1264x, saved 3786
.FSTR FSTR?21,"have " ; 1243x, saved 3723
.FSTR FSTR?22,"It's " ; 680x, saved 3391
.FSTR FSTR?23," down" ; 777x, saved 2325
.FSTR FSTR?24,"thing" ; 2034x, saved 6096
.FSTR FSTR?25," are " ; 1453x, saved 4353
.FSTR FSTR?26," the " ; 10856x, saved 32562
.FSTR FSTR?27," from" ; 1659x, saved 4971
.FSTR FSTR?28," like" ; 708x, saved 2118
.FSTR FSTR?29," with" ; 2283x, saved 6843
.FSTR FSTR?30,"Your " ; 434x, saved 1730
.FSTR FSTR?31," you " ; 3533x, saved 10593
.FSTR FSTR?32," and " ; 2954x, saved 8856
.FSTR FSTR?33," to " ; 5919x, saved 11832
.FSTR FSTR?34,"The " ; 7005x, saved 21009
.FSTR FSTR?35," for" ; 1709x, saved 3412
.FSTR FSTR?36," you" ; 2622x, saved 5238
.FSTR FSTR?37,"ound" ; 1487x, saved 2968
.FSTR FSTR?38,"You " ; 4850x, saved 14544
.FSTR FSTR?39," out" ; 1316x, saved 2626
.FSTR FSTR?40,"ing " ; 4897x, saved 9788
.FSTR FSTR?41,"not " ; 1464x, saved 2922
.FSTR FSTR?42," is " ; 4457x, saved 8908
.FSTR FSTR?43,"tion" ; 1669x, saved 3332
.FSTR FSTR?44,"ough" ; 1076x, saved 2146
.FSTR FSTR?45," of " ; 4520x, saved 9034
.FSTR FSTR?46,"n't " ; 2044x, saved 6126
.FSTR FSTR?47,"ight" ; 2038x, saved 4070
.FSTR FSTR?48,"here" ; 2168x, saved 4330
.FSTR FSTR?49,"his " ; 1855x, saved 3704
.FSTR FSTR?50,"You'" ; 670x, saved 2674
.FSTR FSTR?51,"side" ; 1049x, saved 2092
.FSTR FSTR?52,"look" ; 1208x, saved 2410
.FSTR FSTR?53,"door" ; 1205x, saved 2404
.FSTR FSTR?54,"ard" ; 1561x, saved 1558
.FSTR FSTR?55,"'s " ; 2335x, saved 4664
.FSTR FSTR?56,". """ ; 964x, saved 2886
.FSTR FSTR?57,"ain" ; 1884x, saved 1881
.FSTR FSTR?58,"see" ; 1863x, saved 1860
.FSTR FSTR?59," a " ; 5046x, saved 5043
.FSTR FSTR?60," in" ; 4612x, saved 4609
.FSTR FSTR?61," be" ; 2476x, saved 2473
.FSTR FSTR?62," st" ; 2439x, saved 2436
.FSTR FSTR?63,". I" ; 876x, saved 2622
.FSTR FSTR?64,"re " ; 1800x, saved 1797
.FSTR FSTR?65," th" ; 2666x, saved 2663
.FSTR FSTR?66,"hat" ; 2270x, saved 2267
.FSTR FSTR?67,"way" ; 1628x, saved 1625
.FSTR FSTR?68,"ess" ; 1602x, saved 1599
.FSTR FSTR?69,"one" ; 1885x, saved 1882
.FSTR FSTR?70,"ack" ; 1536x, saved 1533
.FSTR FSTR?71," on" ; 1864x, saved 1861
.FSTR FSTR?72,"en " ; 2012x, saved 2009
.FSTR FSTR?73,"ing" ; 4987x, saved 4984
.FSTR FSTR?74,"the" ; 3080x, saved 3077
.FSTR FSTR?75,"es " ; 2373x, saved 2370
.FSTR FSTR?76,"ent" ; 2603x, saved 2600
.FSTR FSTR?77,"and" ; 2842x, saved 2839
.FSTR FSTR?78,"an " ; 2397x, saved 2394
.FSTR FSTR?79," it" ; 2912x, saved 2909
.FSTR FSTR?80,"ear" ; 1959x, saved 1956
.FSTR FSTR?81,"ver" ; 2265x, saved 2262
.FSTR FSTR?82,"rea" ; 2358x, saved 2355
.FSTR FSTR?83,"all" ; 3370x, saved 3367
.FSTR FSTR?84,"ter" ; 2457x, saved 2454
.FSTR FSTR?85,"..." ; 598x, saved 2386
.FSTR FSTR?86,"ed " ; 3256x, saved 3253
.FSTR FSTR?87,"er " ; 3487x, saved 3484
.FSTR FSTR?88,"ly " ; 3606x, saved 3603
.FSTR FSTR?89,"st " ; 1924x, saved 1921
.FSTR FSTR?90,". " ; 5144x, saved 5141
.FSTR FSTR?91,", " ; 8405x, saved 8402
.FSTR FSTR?92,"e." ; 1512x, saved 1509
.FSTR FSTR?93,"--" ; 765x, saved 1524
.FSTR FSTR?94,"I " ; 1526x, saved 1523
.FSTR FSTR?95,"A " ; 0x, saved -3
.FSTR FSTR?96,";" ; 761x, saved 1516
Incidentally, the design of ZSCII is one of my only persistent peeves with the otherwise nearly-flawless design of the Z-machine. The compression approach (based on packing three five-bit characters into two eight-bit bytes of data) is just not very good, and adds a lot of complexity for how little space it actually saves. Glulx uses Huffman coding, which is better, but there’s a blindingly superior option available for the type of 8-bit platforms the Z-machine originally targeted: just encode the text with a 256-entry dictionary consisting of the most common English character sequences, the most common individual characters, and a codepoint for “read another byte and look it up in a second dictionary of less-common individual characters”. This is how algorithms like smaz work, and it can easily achieve ~50% text compression with near-instantaneous decoding. /rant