Fizmo cannot restore Tethered?

Angstsmurf · May 4, 2019, 5:06pm

Right, another Fizmo/Dialog incompatibilty, I guess:

When I try to restore a save made with Fizmo, Tethered gives the error:

Game state restored successfully.

(Technical trouble: Invalid error code! Attempting to recover with UNDO.)

Undoing the last turn.

(Technical trouble: Invalid error code! Attempting to recover with UNDO.)

Undoing the last turn.

(Technical trouble: Invalid error code! Attempting to recover with UNDO.)

Undoing the last turn.

(Technical trouble: Invalid error code!)

I think these are messages from Tethered/Dialog rather than the Fizmo interpreter. Save and restore works fine in Fizmo in zcode games made with Inform.

EDIT: I’ve reproduced this with the Ncurses, Sdl and console versions of Fizmo. Does anyone have an idea?

borg323 · May 5, 2019, 7:09pm

I get a similar issue with jzip (both my modified version and the original 2.1). I also tried with a save game from frotz with the same results, while frotz works with either save game. I dumped the memory and everything seems fine, so this is probably tethered depending on some frotz implementation detail.
Maybe @lft has can suggest some additional tests?

lft · May 6, 2019, 7:45am

I have looked briefly into this, and I think the problem is with Fizmo.

Frotz, Gargoyle, and Fizmo all produce slightly different quetzal save files. Frotz and Gargoyle can restore savefiles produced by either interpreter, but Fizmo cannot restore from Frotz, Gargoyle, or itself. My minimal test program (see below) doesn’t crash immediately like Tethered, but if you press return after restoring, the program jumps into the error handler, which is wrong.

There’s something strange going on in Fizmo’s save files. Looking at the Stks chunk of a Tethered savefile, I see a frame with 56 words on the eval stack (all zero), whereas I know that my routines push at most one word, and then immediately pop it. And when saving from the small test program, I get the following as part of Stks:

0012d6 00 01 01 0000

That’s a routine with zero local variables, but argument ‘a’ was passed. This appears to be a violation of the spec. Also, the call instruction just before 0012d6 does not pass an argument.

But for the particular crash we’re looking at, my best guess is that something is still wrong with catch/throw. Both runtime errors and regular failures can be invoked using throw, and their handlers are (naturally) installed at different stack levels. So if Fizmo somehow rewinds the stack one time too much, this would happen. But it’s confusing, because I thought we just fixed that error, and I don’t see why it would work differently across a save/restore. So perhaps the answer is in the weird Stks frames after all.

I’m not inclined to dig deeper into this, because I’m reasonably confident that it’s a Fizmo bug, and getting on top of Fizmo’s internal stack handling seems like a lot of work. If something turns up to suggest that Dialog is at fault, I’ll have another look of course.

Small Dialog test program

(program entry point)
	*(repeat forever)
	> (get input $X)
	(handle $X)
	(fail)

(handle [save])
	(if) (save $ComingBack) (then)
		(if) ($ComingBack = 1) (then)
			coming back
		(else)
			saved ok
		(endif)
	(else)
		save error
	(endif)

(handle [restore])
	(restore)

(error $Val entry point)
	Got $Val.
	(quit)

lft · May 6, 2019, 8:34am

To elaborate: If Fizmo restores the stack improperly, perhaps by omitting the oldest frame or creating an extra frame, then the throw/catch cookies would be off by one, and this would potentially lead to the observed error. Such a bug would not be a problem for Inform stories, because they don’t use throw/catch.

Dannii · May 6, 2019, 11:43am

@Angstsmurf You are testing this with a version of Fizmo that has the recent patch for @throw?

Angstsmurf · May 6, 2019, 11:44am

Thanks for looking into this! I’m no good with the internals of Fizmo (or Jzip) either, but if there is any testing or similar I can do to help, just let me know.

Angstsmurf · May 6, 2019, 11:46am

@Dannii Yes, without that patch the game won’t run far enough to make a save, let alone restore it.

Dannii · May 6, 2019, 12:06pm

I assumed so, just checking.

If Fizmo can’t restore savefiles from other interpreters that does imply there are more bugs in its code, but where could be hard to find. The 56 words on the stack seems like it could be a good place for someone to start looking.

borg323 · May 6, 2019, 2:06pm

The problem I’m seeing with jzip is a bit different. The above test code runs fine, but the tethered save file is not restored correctly:

> restore
Enter a file name.
(Default is "tethered.sav"):
Game state restored successfully.
Below the cliff

> look
Come again?

> quit
Come again?

The save file is identical to the one from frotz (with the exception of default interpreter number and version in the header). Saving right after the restore (with a modified executable), the only differences are the incremented pc and the success code stored in global register 0. Anything else I can check?

lft · May 6, 2019, 6:53pm

How intriguing! My compliments for a very clever test method. I can’t think of anything else to check at the moment, but it’s all very interesting.

I can reproduce what you describe with jzip 2.1 by overwriting the instruction following the save with a jump back to the save. I also tried to invoke restore from a different stack level, but this did not change the behaviour.

Since frotz and jzip produce identical files, it seems reasonable to assume that save is implemented correctly. But if a restore is executed from another part of the code, and that cause the program to fast-forward to the saved state and resume after the save instruction, and the next instruction saves again, and the file is still identical, then it would seem that restore also works correctly. And that’s really mysterious! How can the program behave differently in these two cases?

Furthermore, undo seems to work, which suggests that the code that deals with executing instructions and storing results is correct too. And that seems to narrow it down to the quetzal routines, which I just argued seem to be fine.

Such a magnificent debugging puzzle, presumably with a perfectly mundane solution.

Angstsmurf · May 6, 2019, 7:36pm

I tried compiling your small Dialog test program, and unless I’m missing something, it seems to work fine while saving and restoring in Fizmo. I guess that means it is something specific about Tethered that is causing the error.

lft · May 6, 2019, 7:39pm

@Angstsmurf, try pressing return at the prompt after restoring. That should trigger the error.

Angstsmurf · May 6, 2019, 7:46pm

Right, I was missing something! I see the message “Got 0.” now after restoring and pressing return in Fizmo, while ZVM and Bocfel just give a blank prompt.

lft · May 6, 2019, 9:06pm

The Mysterious Case of Jzip Failing to Restore Tethered

Chapter the Last, in which a most curious scheme is revealed.

Ahem. Ladies, gentlemen. I have gathered you here in the drawing room to present a perfectly simple solution to an impossible puzzle. Inspector, have you locked the doors as I instructed? Very good. Now, then.

A game state is saved. After saving, the game still works. The savefile is confirmed to be correct, because it is identical to savefiles produced by another interpreter.

Next, the saved state is restored, and the game no longer works. Did the code for restoring a savefile misbehave? At a first glance, it would appear so.

But! If the story file is deviously modified, such that after restoring, the game state is immediately saved again, then the exact same, perfectly valid savefile is produced.

How, then, could the restore functionality be guilty? Put another way, how is it possible that the state of the game after restoring is both valid (because it can be saved correctly) and invalid (because the game crashes)?

(Dramatic puffing of cigar)

The answer to the conundrum is as baffling as it is simple. You see, inside this particular interpreter, there happens to be not one, but two copies of the game state. The copies are identical, with one exception: Every time the restore operation attempts to reconstruct them from a savefile, it performs a slight miscalculation. The error only affects one of the copies, but it’s the one that is involved in the actual gameplay. The other copy, still immaculate, is only used when saving the game.

And that, ladies and gentlemen, is the entire secret of the Mysterious Case of Jzip Failing to Restore Tethered: Identical twins.

Inspector, be a good lad and escort this particularly nefarious bug to the station.

The curtain falls. Of course, I have exaggerated somewhat. There aren’t two full copies of the game state, but there’s one single detail that is represented in two different ways: The number of frames on the call stack. A variable called frame_count is incremented whenever a stack frame is created, and decremented whenever one is destroyed. But during a save operation, the frame_count variable isn’t used. Instead, the number of stack frames that are saved depends on the actual contents of the call stack, as deduced by following frame link pointers. When restoring a game, as I suggested, frame_count is miscalculated. But this does not affect subsequent saves.

But if frame_count isn’t used during saving, when does it get used at all? Elementary: It is used as the magic cookie returned by catch, and supplied as a parameter to throw. When a game has been played through from the beginning, catch produces cookies based on the correct frame_count, which is why the savefile is identical to one produced by another interpreter. After a restore, the contents of the stack are correct, but frame_count is wrong. Throw computes the difference between its parameter and the old frame_count, gets the wrong answer, and quietly explodes. And, as we’ve seen before, Inform games do not use throw and catch, which is why the bug has remained undetected until now.

The fix is quite simple:

quetzal.c:
403c403
<             for ( fp = STACK_SIZE - 1, frame_count = 0; currlen > 0; currlen -= 8 )
---
>             for ( fp = STACK_SIZE - 1, frame_count = 0; currlen > 0; currlen -= 8, frame_count++ )

borg323 · May 6, 2019, 9:36pm

Excellent work. I was just coming back with exactly the same finding - you beat me by half an hour. But if I count the time it must have taken you to write the description it was probably more than an hour…

PS. Since you seem to like debugging hacks, to track this down I modified z_save to return 2 instead of 1, so after saving the restore patch was taken, and to my surprise (but no-one else’s any more) everything worked fine. So I dumped all subsequent instructions and compared with a dump after a normal restore. The divergence happened after a throw, so a quick look at the code showed that frame_count is set to zero but not updated in the for loop and that was that.

PPS. Fizmo seems to avoid this bug - the frame counter is increased when loading a saved game (https://github.com/chrender/libfizmo/blob/master/src/interpreter/savegame.c#L1486).

Petrosilius · May 7, 2019, 9:39pm

Hi, thanks for looking into this. As it turned out, the number of stack frames was incorrectly evaluated by fizmo when a game was restored, so any invocation of throw would rewind to the wrong stack frame. This should now be fixed in fizmo’s github version.

borg323 · May 7, 2019, 10:21pm

If I understand correctly the frame counter was one more than it should be?

jholder · May 8, 2019, 1:54am

Great work there Linus! The quetzal code I accepted as a contribution to Jzip was authored by Martin Frost himself and it just goes to show that even the author of a standard can occasionally make small errors.

Petrosilius · May 8, 2019, 4:49am

That’s correct – while the line you’ve already referenced increases the counter for every restored stack frame, the creation of the first stack frame implicitly increased the counter once more.