I’m also working with vim and makefiles (but Dialog rather than I6). My automated testing consists of a directory of test cases, which are text files with the suffixes .in
and .ok
. The in-files contain player input. Nothing is saved or restored; each test case starts from the beginning of the game.
The makefile runs each in-file through the game, and writes the output to an .out
file (which is kept around, but not stored in source control). Then it compares each out-file to the corresponding ok-file. If they are not identical, it shows the diff
and prints a command that I can copy-paste in order to view the diff graphically; the command would be meld testcases/foo.{out,ok}
for a test case called “foo”. After reviewing the differences, I can easily cursor-up and replace “meld” with “cp” to approve the new version. There’s also a make bless
rule to approve all test cases in one go.
During development, once I reach a point where it’s possible, I play through the game from start to finish, fiddling with all the complex parts, and save a transcript of my input. I grep
out the prompt-lines and sed
away the prompts themselves, and this becomes the first test case. I review its output and bless it. Later, when I get transcript from beta testers, I convert those in the same way, gradually obtaining a collection of test cases.
When something changes in the code—perhaps an object starts in a different location—some of the test cases will break. Then I have to modify the test cases until they manage to reach the end of the game again. Based on the diff, it’s often straightforward to see where the input-file needs to be updated.
This works very well for me. It’s very reassuring—especially when making small changes close to the comp deadline—to be able to run the full set of tests and see exactly what changed.
One thing is problematic: Any output that depends on the random number generator is fragile. Of course I use a fixed random seed, but here’s the thing: Every time you add or remove a call to the RNG somewhere in the code, this will offset the sequence of random numbers past that point, so you’ll have lots of differences to review. If the randomness affects more than just flavour messages, you’ll often break the test cases too. For instance, an NPC who walks around at random will not appear in the location where the tester tries to interact with it. Updating the test cases ends up being a lot of work. On the flip side, by constantly reviewing and updating the test cases, you get a feeling for how robust the game is against variations in the random sequence.
I have an idea for how to alleviate this problem, but I haven’t tried it yet. The idea is to patch the interpreter to reset the random seed to a known value ahead of each line of input. It doesn’t have to be the same value every time; it could be a function of the current move number, for instance. That would still offset the random sequence when a test case is modified (to make it work after something changed in the game), but it would eliminate a lot of flavour text variations that you’d only skim through anyway.