Announcing a “New” Infocom Story File

russotto · October 21, 2020, 11:29pm

The culmination of a project over 15 years in the making, Alessandro Giassi and Matthew Russotto announce the recovery of the Moonmist v65 (‘A’) beta story file. This is the earliest known version of Moonmist, and is one of the few extant beta version Infocom games, including Infocom’s debugging code.

Background: In 2004, Matthew Russotto set out to catalog and collect all versions of the Infocom games available on the net. This resulted in the (long-since-outdated) Infocom Version List and the Bad Games List. Of all the bad games, most were only slightly damaged and all but one had good copies available. The Moonmist beta was significantly damaged, clearly unplayable, and all copies on the net stemmed from the same source, an Apple II disk image. Examination of the damage revealed that it was done before or during the decompression of the game from an Apple II compressed format called DDD; while some information was lost, most information was intact and, through a tedious process of comparison with Moonmist v4, the lost information could be deduced. Russotto got through about 3 Apple II disk tracks (of 17), but started to suspect that the errors would compound and too much data would be lost. He put the project aside, and picked it up only occasionally in the next few years. Despite a search of the dark corners of the net, he never found the compressed file the damaged file came from.

In 2020, Alessandro Giassi contacted Mr. Russotto and asked him about the DDD file. Giassi had discovered the Moonmist beta and the Bad Games List and was working backwards through the disk image. This told Russotto that the file was indeed recoverable to the end, and rekindled his interest in the project. Giassi had also come up with methods to recover and verify parts of the file that Russotto had only guessed at, making recovery more reliable. Both worked on the tedious comparison, Russotto working forwards and Giassi working backwards. The two versions were compared and errors fixed. Finally, neither could discover any more errors, and the checksum of the file matched the checksum in the header.

The Moonmist beta is generally similar to Version 4, though often more verbose. For instance, in v4 we’re told about a war club "that once belonged to the Zulu king Dingaan -- and it's studded with large diamonds!” In v65 that club is "
"a war club or ~knobkerrie~ that once belonged to the dreaded Zulu king known as Dingaan the Vulture -- and it's studded with large, brilliant diamonds". The aerosol device Bolitho gives you is described in v65 as spraying “ammonia or Mace” rather than “something foul”; perhaps Infocom Legal was worried after the whole issue with the New Zork Times. A "sarcastic look" in v65 has become an angry one by v4. There are no doubt many small difference in gameplay as well, though these have not been cataloged.

The file’s true serial number has been effaced (replaced with “Broken” with the high bit set on each character, as is Apple II tradition). However, many Infocom games have a second serial number; in Moonmist’s case, this can be retrieved with “$verify 105”; Moonmist v13 has 67, v9 has 58, v4 has 47, and this beta has 7. So this is an early build indeed. Furthermore, the Infocom source code contains a table of builds for Moonmist (m5.chart), which dates this file to August 20th or 21st, 1986, for a serial number of 860820 or 860821.

(Incidentally, 105 is also the opcode of the Bitcoin Script verify instruction. Whether this says anything about the identity of Satoshi Nakamoto is outside the scope of this project.)

(For now, the recovered file may be obtained at the Bad Games List page – MTR)

zarf · October 22, 2020, 3:02am

Whoof. I’m impressed!

Adding to my catalog.

russotto · October 22, 2020, 3:21am

You’re fast! I was looking up something else on your page and noticed you’d already added it.

eriktorbjorn · October 22, 2020, 11:22am

I assume it’s similar to the debug mode in the release version of The Witness, seeing how Stu Galley was involved with both? From what I could figure out from the The Witness source code:

You use the $DBG command to toggle it on/off. When it’s on:

You get a message every time a character walks from one room to another.
Objects that usually aren’t explicitly listed are, with an “[ndescbit]” prefix.
There’s an “[invisible]” prefix to the message you get if you try to walk through a closed door, if that door is inivisible. I’m not sure what that’s about.
The FIND-NOT-HERE routine prints debug information about how many objects it found, and - if possible - which one it picked. This is apparently used for commands that can refer to objects that aren’t present. (E.g. the nonsense command “CLIMB UP WHISKEY” will print “[Moby-found 2 objects]”.)
The PERFORM routine prints a lot of debug information about what it’s doing:
- The action (as a number), and the name of the direct and indirect objects.
- The actor who’s performing the action.
- The things that are given an opportunity to handle the action, and the result of that.

You also get access to the following commands:

$TANDY toggles the Tandy bit.
$WHR prints the location of all the characters.
$WHR room moves you to that room.
$WHR destination object moves an object.

russotto · October 22, 2020, 11:53pm

It’s not nearly that extensive; you turn it on with #dbg and it prints a lot of the PERFORM stuff. But there’s no extra commands (there’s a $CHEAT in the source but it was apparently added later). You don’t get lost in the dark. And the game doesn’t end; the FINISH routine exits true instead. The source code shows a lot more than that (including the FIND-NOT-HERE data and some additional commands, some specific to Moonmist like changing color and gender) , but apparently all that was added later.

pdxiv · October 23, 2020, 5:23am

This sounds very impressive! I’d love to learn more details about how the recovery and verification was done.

russotto · October 24, 2020, 4:04am

I can give my half, though some of it is lost to faded memory. As I recall, after trying to play the game and having it crash pretty quick, I used ‘txd’ to disassemble it. It was too badly damaged for that, so I hacked up ‘txd’ to basically ignore errors as much as possible and plough on. If you try this, the end of the disassembly before it fails entirely contains a bunch of this

13006:  0d 0d 0d                store           local12 #0d
13009:  0d 0d 0d                store           local12 #0d
1300c:  0d 0d 0d                store           local12 #0d
...
13075:  0d 0d 0d                store           local12 #0d
13078:  12 80 02 62             get_prop_addr   "tower" #02 -> g52

That’s not good, but it pointed me in more or less the right direction, and had me guessing some kind of run-length-encoding thing. I think I tried just removing the '0d’s and getting some glimpses of text beyond that point. Eventually I looked further back in the disassembly and noticed this section:

12954:  09 09 09 09             and             #09 #09 -> local8
12958:  09 09 09 09             and             #09 #09 -> local8
1295c:  09 09 09 09             and             #09 #09 -> local8
12960:  09 09 09 09             and             #09 #09 -> local8
12964:  09 09 09 09             and             #09 #09 -> local8

Somewhere around this time I realized I needed to look in the .dsk file, not the .z3 file, because of the effects of interleave. I found that removing some of the '09’s made the interleave work right (for track $15 – an Apple II track is 4096 bytes, and the first three tracks aren’t used for the game, so track $15 is 0x12000-0x12FFF in the Z-code). This confirmed the idea that I was looking at a disk image that had been compressed and then uncompressed incorrectly. I removed the ‘09s’, shifted a section by a byte, matched some other damaged parts up against v4 to deduce what they were, and I had a good track $16.

I removed the '0d’s… but while track $16 certainly seemed to have snippets of good text and code, it was extremely damaged. By comparing it to v4, I was able to figure out some patterns

Except at the beginning and end of the track, some byte values were completely fine.
Other byte values could represent 1,2, or 3 different values, with no indication of which is which.
The beginning and end of the track were a mess.

I think I worked through track $16 and part of $17, putting in patches by modifying a copy of the “ap2inf” program to apply them as it extracted the .dsk file, before I realized what was going on. Some sort of table-based compression scheme had been used on each track separately, and the tables had been corrupted. Since DDD was far and away the most common compressor in the Apple ][ pirate scene (and this file proudly presented itself as “Cracked by Coast to Coast”), I thought of it immediately. I tracked down an annotated copy of DDD (the internet is amazing), and things became clear.

The DDD compression scheme is simple. The whole disk is recorded as a single bit stream. Each track begins with 20(decimal) octets which are the 20 most common bytes in the uncompressed track. These 20 octets map to prefix codes which are 4,5,6, or 7 bits long. There are 2 8 bit prefix codes, one of which is used to indicate run length encoding; it’s followed by two more 8 bit codes with the byte to repeat and the repeat count. The other is never written but if read, is also treated as the RLE code. All other bytes are encoded with 9 bits.

So what had happened is there was some decoding error on track $15, which had resulted in two erroneous RLE codes being read. This caused DDD to fill its track buffer before the end of the compressed track. It then wrote out the damaged track and read the next 20*8 = 160 (decimal) bits into its compression table. Then it started merrily reading along using a completely wrong compression table… and then it started reading the real compression table for the next track as if it were ordinary data. And then it started reading the next track, still with the bogus compression table. Because it started early, it would end early, and the same thing would happen with every subsequent track, with usually minor shifts – though towards the end there’s another RLE code error which shifted it more.

This explains the substitutions. The prefix code ‘1100’ (which as it turns out always represents ‘00’, because that’s the most common byte in every track) would decode as whatever the first byte in that nonsense table was – hex ‘62’, for track $16. But the 9-bit code for hex ‘62’ would ALSO decode as hex ‘62’. And to make it worse, while the bytes in the real tables were always unique, the bogus tables were under no such constraint. On track $16, another prefix code, which should properly represent ‘02’, also was in the table as ‘62’. So the correct ‘00’, ‘02’, and ‘62’ would be ‘62’ in the damaged file.

It also explains the mess at the beginning and end of the track. Basically each track can be divided into several regions

Garbage caused by “decoding” the frequency table
Decoded data where the decoder is not aligned with the start of the codes
The bulk of the track, aligned data decoded with the broken table that’s really some previous track data,
“Lost” data, where the decoder was reading track data as a new frequency table.
More misaligned data
Aligned data decoded with the new frequency table.

What I did is use comparison with v4 to guess the alignment of section 3 and discard section 1. If I got this wrong I could tell because the interleave would be wrong and things would break at address $xx100, so I could adjust it. I would go through and, by comparing with v4 (repeatedly running my hacked txd), fill in section 2 and substitute the proper values in for section 3. Then when I hit section 4 (substitutions would stop making sense), I would guess the alignment of section 6, fill in the “hole” where 4 and 5 were, and do substitutions for section 6.

Originally I was doing this by patching a copy of ap2inf by hand for each substitution. This was not going to get done before the heat death of the universe. After Alessandro contacted me I started making new tools. First was modifying ap2inf to take a patch list. The patch list could contain ‘default’ substitutions; this is useful because it turns out the literal substitution is by definition the least likely value for a byte to be. Also if you can fill in opcode bytes (especially PRINT and PRINT_RET) automatically, the disassembly works a lot better. Next was a program that would go through and find the next uncertain byte in a track, and suggest the possible substitutions (based on the ones I did already). Then some emacs fun – I wrote some elisp to hilight the byte I was working on in a side-by-side view of the v65 and v4 disassemblies. And an automatic “string rectifier” which would fix identical strings automatically when I reached them. All this sped things up enormously.

Alessandro figured out how to recover Section 4 and the misaligned sections; one of those sections falls on a difference between v4 and v65.

We each independently did a full repair, and checked the files against each other. I don’t know his exact technique; I didn’t ask, as I figured keeping them different would mean errors would be less likely to be identical between our files. I also melted my eyeballs checking my own work, especially the early stuff. And I wrote tools to tell me of calls to routines which didn’t exist, jumps to bad destinations, two-byte immediates that were neither strings nor routines (there are some legit ones), and finally two-byte immediates and routines which could have been ambiguous (these were very annoying).

There was some other damage to the file, sections of bytes which had been XORed with various “almost” constants. These were aligned to the sectors in the .dsk file and were probably either unrelated later read errors, or happened when writing out the uncompressed data.

I made an attempt to figure out what the damaged compressed data was. What I determined (through the ridiculous speed of modern computers, checking out 2^28 or so possibilities) is there’s no possible pattern of damaged compressed data for track $15, that is the right length, which could have caused the trouble. There is one pattern that’s one bit short (though missing a bit is not the ONLY problem, which was a theory at one point, as the very first error is a shifted byte), and many that are longer. So either I’ve made an error, or the problem was not damage to the compressed file but some sort of actual processor or memory malfunction while decoding.

BadParser · October 24, 2020, 6:40am

Well, when you put it like that – it seems pretty obvious.