Customized Infocom game won't fit on version 3 Z-machine

Thank you for testing @heasm66! There seems to be a problem in my gametext.txt if the savings are actually much better in practice than what my algorithm says, so I wonder if there could be more optimal ones, but finding an extra 800 bytes compared to ZILF is nice!

I got these abbreviation s by looking at substrings of length up to 17, I think. (The algorithm really isn’t hard: look at all substrings in the source, score them, pick an abbreviation, replace it, etc while recalculating the score as you go.) Any divergence with reality should be because of gametext.txt i think

2 Likes

Thanks for the discussion! This has inspired me to make some changes to ZAPF’s abbreviation finder.

First, here’s a frequent words file that gets Zork II down to 89,454 bytes:
5_phrase.zap.txt (6.1 KB)

I tested a few different sets of parameters, which will likely make it into a future version of ZAPF as varying search strengths. Here’s how they stack up:

Phrase Length Partial? Story File Size Bytes Saved Run Time (Sec) Bytes Saved/Sec
No abbreviations 103,462 0 N/A N/A
1 No 90,358 13,104 7.2 1,820.0
3 No 90,322 13,140 10.6 1,239.6
5 No 90,322 13,140 11.6 1,132.8
1 Word 89,792 13,670 25.8 529.8
3 Word 89,760 13,702 29.0 472.5
5 Word 89,760 13,702 30.2 453.7
3 Phrase 89,454 14,008 63.7 219.9
5 Phrase 89,454 14,008 72.7 192.7

Notes:

  • Phrase Length limits the number of consecutive words we combine into one abbreviation. Given the string the quick brown fox, a phrase length of 1 will pick out the, quick, brown, and fox as candidates; whereas a phrase length of 2 will also find the quick, quick brown, and brown fox.
  • Partial? indicates whether we look for abbreviations that don’t start or end at word boundaries.
    • “No” means they have to start and end at word boundaries, although they can include the punctuation on either side: "quick", " quick", "quick ", " quick "
    • “Word” means they can start or end in the middle of a word, but can’t span multiple words: " qui", "uic", "ick ", etc.
    • “Phrase” means they can start or end anywhere in the phrase: "he qui", " quick bro", etc.
  • ZAPF’s parallel search algorithm uses all available cores. I ran this on a Ryzen 3700X (8 cores/16 threads), where 1/No is almost fast enough to run on every compile, and 5/Phrase is still fast enough to run regularly. On a single-threaded system*, even 1/No would be painful to run that often.
  • The results can vary depending on which abbreviations are chosen at earlier stages, even when they seem equally valuable at the time. For example, I’m pretty sure it’s possible to get down to 89,452 bytes – because I did it once, by accident, using a nondeterministic version of the code where races between the threads could affect the order of results.

* ZILF for WebAssembly, anyone?

3 Likes

Side note:

ZAPF abbreviates “afety deposit box” because “Safety deposit box” appears in a couple places with the first letter capitalized. Exactly two places, in fact: the descriptions of the Bank of Zork’s East Viewing Room and West Viewing Room.

Those descriptions are almost identical, but one of them has a colon introducing the block of quoted text, and the other is missing the colon. Adding the missing colon lets ZILF merge the two strings, bringing the file size down to 89,282 bytes.

3 Likes

That’s really cool! Very nice find on “afety deposit box”! (Strings 97 and 98, right? That’s an odd mistake!)

I found what was wrong with my gametext.txt : some strings run on multiple lines, which my “grep” cannot find… For now I fixed the multiple line strings in zork2str.zap ; I really need to find a better way to grab the game text going forward.
I let my script run for 60 seconds, which allowed me to check strings up to 60 characters. Here is my abbreviation file (I have no idea what the resulting filesize is, I really ought to download ZIL…).
zork2_freq.txt (6.5 KB)

I suspect it is not as good as @vaporware’s, judging by the frequencies I get, and the fact that my script didn’t see that abbreviating spaces help. But hey it found “afety deposit box” at last :slight_smile:

@vaporware I have no idea what your code does, but I need to mention that my code’s performance really increased when I decided to add a “pruning threshold”. After computing each score, I discard all abbreviations with a score less than 10 bytes / 30 units; then it’s time to sort them and pick 96 abbreviations. Would that help with performance? (I have no idea what your code does.)

… We drifted off-topic, didn’t we? I mean, @mmazeika is going to be able to use a new abbreviation-finder that’ll help them compile “The Lurking Horror”, I suppose :smiley:

1 Like

@mulehollandaise
I tried to compile Zork 2 with your new abbreviation file and the result is 89.452 bytes!

GAME ON?

(And yes, you should download ZILF. It’s fun!)

2 Likes

@mulehollandaise, I’m able to save 490 bytes on lurkinghorror with your abbreviations! Thanks for the help.

5 Likes

now I’m curious what results came from reabbreviating Trinity:slight_smile:

Best regards from Italy,
dott. Piergiorgio.

I’ll gladly run the benchmark test on Trinity but I need help from @mulehollandaise or @vaporware to generate an abbreviation file with the improved algorithm.

2 Likes

@mulehollandaise made a new abbreviation-file for Trinity and here are the data:

Trinity without abbreviations                281.580 bytes
Trinity with the ZILF abbrev-file            257.408 bytes (saved 24.172 bytes)
Trinity with the new abbrev-file             257.068 bytes (saved 24.512 bytes)

Not as much as expected. I don’t know why that is. Could it be that Trinity is more “literary” i.e. more varied language?

1 Like

It could be, or it could be that I did a bad job at getting rid of multi-line strings for my gametext. I really ought to do something about that.

However, just reading through the code, I think Infocom was also more aggressive in reusing chunks of texts. In the other games, most sentences were plainly written, whereas here there were sentences broken into lots more operands to reuse fixed strings more. I had to rely heavily on that trick for “Tristam Island” (things that popped up in more than one place, like " there is no electricity on the island.", got defined as string constants), so I think Infocom must have struggled to make the game fit in 256kb too. Still, 24k saved is on the low side given the size of the game.

1 Like

Not directly applicable to The Lurking Horror, but another large Z3 game – Moonmist – was actually written to allow it to be compiled as a Z4 game, by setting a compile-time constant PLUS-MODE. This does require some modification to ZILF to be able to use the V3-style parsing objects (adjectives recorded in the dictionary) in V4 mode, but once that’s done and an UPDATE-STATUS-LINE routine is written, it works fine.

TLH doesn’t have PLUS-MODE but the changes look pretty straightforward.

Version 5 would be harder, as there are more changes.

5 Likes

A move from Z3 to Z4 would certainly be more than enough to make the changes to TLH the OP are talking about.

That sounds like a good option! Thanks, I didn’t know about PLUS-MODE.

Back to abbreviations for a second; if you build the generalized suffix tree of every string in the program (in z-characters), the possible abbreviations are, I think, represented by the internal nodes in the tree (with a few exceptions) and the multiplicity of the abbreviation is the number of leaves whose ancestor is that node. This doesn’t lead directly to an optimal algorithm but I think it can tractably get you the best abbreviation at any given time.

Huh, interesting. Do you have a patch?

Sure, here’s the patches for V4 moonmist and the Zilf patches. The Moonmist patches are based on Alex Proudfoot’s patch at https://github.com/the-infocom-files/moonmist

mm4patch.txt (5.5 KB)

zilpluspatch.txt (7.6 KB)

I recently turned NEW-VOC? on, so if it’s broken it’s possibly that rather than PLUS-MODE; NEW-VOC? was on in the Infocom source though.

I looked through a few other games and LGOP V5 has PLUS-MODE but it seems to be less complete. The “m5.chart” document for moonmist shows some compilations greater than the z3 limit, so it’s possible there were z4 moonmist files during development.

1 Like

Additionally, the source to Hitchhiker’s Guide R31 (invisiclues) contains a file called “zip-to-xzip.txt”, which has the promising header “HOW TO TURN A ZIP GAME INTO AN XZIP GAME”, with code and some rather terse instructions for doing so. This would be V5; V4 is EZIP.

2 Likes

ZAPF stuff aside for a second, @mmazeika I think the Solid Gold Versions already are Z5 in some cases. If this is true for Lurking Horror (it’s 5am so bear with me if I don’t check for you!) then it’s ready and waiting for your modification efforts*

*possibly… It may be that they already used up the extra space, but you won’t know until you try I guess.

I’d be keen to see the results anyway! Keep us posted please!

Adam

Only a couple of games were released as “Solid Gold”. I always tend to forget at least one when listing them, but it should be The Hitchhiker’s Guide to the Galaxy, Zork I, Leather Goddesses of Phobos, Planetfall and Wishbringer. As mentioned earlier in the thread, Dave Lebling has said that he would have liked to make Lurking Horror a large game because some things had to be cut for size, but alas…

While preparing for the Eaten by a Grue podcast, co-host Kay Savetz noted that the game isn’t always as fleshed-out as you might expect, and it still seems to have been quite a squeeze to get it all to fit!

I have mixed feelings about the Solid Gold releases. Some of them are, I think, not up to Infocom’s usual standard. I guess quality control suffered towards the end and so some strange bugs snuck in. Like this one in Planetfall:

>SOUTH
Lower Elevator
This is a medium-sized room with a door to the north which is open. A control
panel contains an Up button, a Down button, and a narrow slot.

>SLIDE ACCESS CARD THROUGH SLOT
A recorded voice chimes "Elevator enabled."

>PUSH DOWN BUTTON
Pushing down the little button isn't notably helpful.

That is because the game thinks you want to push the button on the diary, an object that was added to replace one of the feelies in the original release. I guess the testers all dropped the diary before they got that far into the game.

Then there’s this in the Solid Gold version of Wishbringer:

>TIME
[It's 5:02 pm. You have -1 hours and 58 minutes to complete your delivery.]

It turns out that unlike the original release, here it’s almost, but not quite, impossible to run out of time.

There’s smaller stuff too, like how Leather Goddesses of Phobos still referring to the 3-D comic book in your package, even though that version came with a regular 2-D version of the comic. And you can probably make that game unwinnable on the first move simply by typing “PUSH ORANGE BUTTON”, even though you’re literally millions of kilometers away from where that button is.

On the other hand, there seems to have been some ambition to add value, e.g. for Zork I they restored some text from the guidebook that was previously cut in the transition from MDL to ZIL.

3 Likes