Highly optimized abbreviations computed efficiently

heasm66 · February 13, 2021, 11:38am

Over in the thread about Mini-Zork II I noticed that the abbreviation files that @russotto saved ~300 bytes more than the other. Why?

One difference was that the beginning of his files looked like this:

        .FSTR FSTR?1," in the "		; 35x, saved 202
        .FSTR FSTR?2,"n the "		; 38x, saved 146
        .FSTR FSTR?3,"to the "		; 47x, saved 228
        .FSTR FSTR?4,"o the "		; 13x, saved 46
        .FSTR FSTR?5," of the "		; 38x, saved 220
        .FSTR FSTR?6,"s the "		; 51x, saved 198
        .FSTR FSTR?7," the "		; 154x, saved 457

and the other like this:

        .FSTR FSTR?1," the "		; 376x, saved 1123
        .FSTR FSTR?2,"The "		; 243x, saved 724
        .FSTR FSTR?3," is "		; 198x, saved 392
        .FSTR FSTR?4,"You "		; 117x, saved 346
        .FSTR FSTR?5," you"		; 174x, saved 344

The glaring difference is that the first file waits a while to define the clear winner " the " and list a couple of phrases that have the winner as a part. This is a good strategy because if " the " is the first abbreviation it ‘ruins’ tho other ones and they will never get used!

I wrote a little test (in VB.NET, hope that doesn’t bug too many!) where I try to implement this logic. The basic structure of the code is largely based on the Python-script by @mulehollandaise and doesn’t use suffix-tree and is therefore a bit slower (30 s on my computer for Mini-Zork II).

Identify the first winner.
Pick a threshold based on this winner. I have experimented with different values, but for Mini-Zork II 8% gave the best result.
Instead of using the winner pick the longest abbreviation that clears the threshold and have the winner inside it.
Recalculate index.
Pick a new winner
repeat from 3 until 96 abbreviations are found.

Below is the code if anyone wants to use it or improve it. For Visual Studio you need a Form with a Button and a multiline Textbox and then paste just paste the code in the Form’s class.

Beware! The code isn’t very polished yet.

Form1.vb.txt (11.8 KB)