Automated testing with fizmo-console

SpaceHobo · September 29, 2020, 9:18am

I am working in inform6 from vim/bash/etc, and have reached the point in my development where I’d like to have some sort of test suite to make sure my game behaves the way I expect (or hope!). I’ve seen demonstrations of the testing facilities built into inform7, and would like to have something equivalent I could run quickly from my Makefile.

The fizmo-console program looks like it has a number of features to make this possible. It can auto-load from a save file, and execute a single command before saving again. It also displays the banner after each room description in a surprising (but at least distinctive) way, and supports commands like /predictable to lock the RNG seed to a known constant.

Am I right in thinking that this would be a good tool for testing?
Are there any examples of other people who have set something up for headless testing of .z* files? Are there common patterns for this?

My first instinct for a naïve approach is to keep test input files with blocks of commands, some ending in save commands to create state files to load from in later ones. I’d also have stdout from these invocations dumped to output files, which I could compare outright with diff or search for patterns via other tools.

This approach has a number of aspects that make it seem fragile to someone who is used to lower-level testing; but the fundamental constructs of these games are blocks of text, so I’m not as concerned as I otherwise might be. Should I be writing my tests as a library, instead?

lft · September 29, 2020, 10:55am

I’m also working with vim and makefiles (but Dialog rather than I6). My automated testing consists of a directory of test cases, which are text files with the suffixes .in and .ok. The in-files contain player input. Nothing is saved or restored; each test case starts from the beginning of the game.

The makefile runs each in-file through the game, and writes the output to an .out file (which is kept around, but not stored in source control). Then it compares each out-file to the corresponding ok-file. If they are not identical, it shows the diff and prints a command that I can copy-paste in order to view the diff graphically; the command would be meld testcases/foo.{out,ok} for a test case called “foo”. After reviewing the differences, I can easily cursor-up and replace “meld” with “cp” to approve the new version. There’s also a make bless rule to approve all test cases in one go.

During development, once I reach a point where it’s possible, I play through the game from start to finish, fiddling with all the complex parts, and save a transcript of my input. I grep out the prompt-lines and sed away the prompts themselves, and this becomes the first test case. I review its output and bless it. Later, when I get transcript from beta testers, I convert those in the same way, gradually obtaining a collection of test cases.

When something changes in the code—perhaps an object starts in a different location—some of the test cases will break. Then I have to modify the test cases until they manage to reach the end of the game again. Based on the diff, it’s often straightforward to see where the input-file needs to be updated.

This works very well for me. It’s very reassuring—especially when making small changes close to the comp deadline—to be able to run the full set of tests and see exactly what changed.

One thing is problematic: Any output that depends on the random number generator is fragile. Of course I use a fixed random seed, but here’s the thing: Every time you add or remove a call to the RNG somewhere in the code, this will offset the sequence of random numbers past that point, so you’ll have lots of differences to review. If the randomness affects more than just flavour messages, you’ll often break the test cases too. For instance, an NPC who walks around at random will not appear in the location where the tester tries to interact with it. Updating the test cases ends up being a lot of work. On the flip side, by constantly reviewing and updating the test cases, you get a feeling for how robust the game is against variations in the random sequence.

I have an idea for how to alleviate this problem, but I haven’t tried it yet. The idea is to patch the interpreter to reset the random seed to a known value ahead of each line of input. It doesn’t have to be the same value every time; it could be a function of the current move number, for instance. That would still offset the random sequence when a test case is modified (to make it work after something changed in the game), but it would eliminate a lot of flavour text variations that you’d only skim through anyway.

Dannii · September 29, 2020, 12:05pm

Check out RegTest by Zarf:

It lets you easy set up test cases consisting of commands to enter and then as many regular expression tests on the output as you would like. Very simple and effective, I use it for testing my interpreters, and also sometimes to play games. (I used it to work out my best strategy for Sugarlawn.)

SpaceHobo · September 29, 2020, 2:25pm

Aha! Thank you, regtest looks perfect! I will definitely start there.

I’m a bit of a Python-head myself, and everyone who’s done Python for any length of time has been lured by the siren of the doctests module. This actually looks like a perfectly suitable application of that model (whereas doctests itself has serious drawbacks when leaned on any more than a little).

zarf · September 29, 2020, 2:44pm

Yep, thanks for mentioning it before I did :)

RegTest can run on fizmo-console. If you want to test keystroke input, timed input, or other fancy stuff, you can compile fizmo-glk with remglk and use RegTest in remglk mode.

I don’t have a solution for the deterministic-but-fragile RNG sequence problem. I don’t use a lot of random state in my games. (I use quite a bit of randomly varying output text, but I write the tests to ignore this or use a regex to match all possible outputs.)

I strongly prefer using an external testing tool because that way I’m testing the same game file that players will play. Any internal testing library runs the risk of altering game behavior.

Well, I say “same game file” – there’s a separate question of testing the debug version versus testing the release version. In a large game, it might be handy to write a test script that uses debug commands like GONEAR and PURLOIN. I did a lot of this in Hadean Lands. But then I also wrote a set of tests that played from beginning to end, not using debug commands, and ran those on the release version.

SpaceHobo · September 30, 2020, 6:56am

It seems it never gets a good answer from select() when I try this. At first I was worried it had something to do with the way fizmo-console prints the status bar immediately after each prompt (preceded by 80 spaces, presumably to wrap it?), but it never even gets that far despite the process’s stdout being full of game text when I strace.

Since I don’t need the raw-input or image/url object-detecting features of anything but the “cheap” mode, I may adapt some of this as a pexpect script to ensure my yaks have the smoothest possible shave.

zarf · September 30, 2020, 3:25pm

Heh, that’s because I was just wrong. I’ve never tested fizmo-console this way, only fizmo-remglk. Sorry about the misleading comment.

If there were an option to disable the status-line output, it would probably work. The big limitation of the script (when reading in console mode) is that it tries to keep reading until it reaches a “>” character at the end of the output. I expect that in the current situation, it reads “>” followed by the status line and then tries to keep reading forever.

SpaceHobo · September 30, 2020, 3:28pm

I suppose that’s possible, but I really think this is just another annoying select() semantics thing, as I couldn’t ever get it to realise that it needed to outfile.read() despite game introduction text appearing to be written to the right fd in strace.

Dannii · September 30, 2020, 3:33pm

Do you need the features of Fizmo? Any console “cheap” interpreter will work with RegTest. My ZVM interpreter is very easy to install if you already have Node: https://www.npmjs.com/package/ifvms

The main thing it doesn’t have is a randomness override.

SpaceHobo · October 1, 2020, 12:44pm

Do you need the features of Fizmo? Any console “cheap” interpreter will work with RegTest. My ZVM interpreter is very easy to install if you already have Node: ifvms - npm

Aha! This looks like just the thing, really. I have a bug filed against fizmo-console, and may file another, but this lets me charge on without shaving any more yaks. Thank you!

The main thing it doesn’t have is a randomness override.

Fortunately I don’t expect to need it in my current project.

hlship · October 3, 2020, 7:50pm

This is virtually identical to what I’ve developed in dgt (https://github.com/hlship/dialog-tool); worthy of note is that I execute tests in the dialog debugger, and my input files may contain (now) or other queries to bypass parts of game logic; but I do maintain a particular long execution that’s start to finish with only a single cheat (for a very unpredictable random part).

I have thought about building the .z8 file, and adding Dialog commands for the test cheats, then running it through dumb frotz instead of dgdebug.

Since I’m working in Joker (a Lisp) and not a Makefile, I should be able to do the above, but also something more complex; I want to loop back to investigating Zarf’s approach as well.

SpaceHobo · October 3, 2020, 8:01pm

It turns out you were absolutely correct, though! The trouble is that if you don’t specify a path, it executes fizmo-console in a funny environment that can’t find the story file. So if you say

** game: test.z8
** interpreter: fizmo-console

this will fail. But if you put:

** game: ./test.z8
** interpreter: fizmo-console

Then everything works great.