A simple framework for implementing regression tests in TADS3/adv3

jbg · August 10, 2023, 12:31am

As part of a separate effort to write a modular replacement for adv3’s executeCommand() (in turn part of my effort to implement noun-as-verb actions in adv3), I’ve written a very rudimentary framework for bolting regression testing onto existing games. It is available from my regressionTest git repo.

The approach is slightly less black box than most regression tests, because I’m not interested in testing the interpreter, but rather changes made to the builtin parser by individual games/modules. So instead of implementing a purely “external” testing method, the gimmick here is that the module provides its own self-contained gameworld and player object, and runs all of its tests using them. So it is (in theory) easy to drop the module into an existing project with virtually no modifications.

I’m not sure how much use most authors would get out of such a thing, but I’m in the specific situation where I often want to do a/b testing to verify that some tinkering I’m doing with parser internals isn’t interacting with/breaking something I don’t realize I’m affecting (for example, I recently discovered my first pass at a modular executeCommand() replacement broke Quit, because QuitAction uses a special exception inside of yesOrNo(), and my bespoke parser loop was trying to handle the exception itself).

The basic workflow is something like:

Build the demo game using the provided source and makefile
Create a command file using >RECORD while playing the demo via frob (or presumably some other interpreters, but I’ve only tested with frob). The command file should end with >QUIT and >Y to that the test scripts automagically exit the interpreter during replay as well
Copy the command file into the ./data directory
From the ./scripts directory, run the generate_transcript.sh script. This will re-build the game (insuring the right flags are set during compilation) and then run it with the supplied command file, saving the transcript to ./data/transcript.txt

Then you can include the module by adding -lib [path to module]/regressionTest to an existing project.

Having done that, then from the top of the source tree of the game to be tested you can run the regression_test.sh script, which will re-build the project, run the game with the command file, save the transcript, and then diff the test transcript and the reference transcript.

All of the “stuff” in the module is inside preprocessor flags, so you can enable or disable it all by toggling the -D REGRESSION_TEST flag when compiling the project.

The scripts presume the layout of the project to be tested will follow the conventions I use in my T3 modules, but everything is controlled by variables at the top of the script so they should be easy to tweak if your projects use a different code layout. Optional command line arguments are also documented in the comments at the top of the scripts, and can be displayed by running the scripts with the -h option.

Feedback welcome. I’d particularly like to expand the scope of the toy gameworld used by the testing process, and to implement more “differentiating” test behaviors in the command file. That is, to identify more actions that exercise different parser bits to verify they’re working: so >TAKE PEBBLE and >TAKE ALL involve slightly different bits of parser code even if the game effects are identical; >BOB, TAKE PEBBLE uses a bunch of different code that when the player does a >TAKE PEBBLE, and so on.

inventor200 · August 10, 2023, 1:12am

Did I miss something in the TADS 3 docs that allowed me to pass custom flags to the compiler, and read those flags with the preprocessors??

jbg · August 10, 2023, 1:52am

Maybe? I honestly don’t remember where I learned about how to do it.

But yeah. You can always pass flags via -D SOME_FLAG_NAME to t3make (either on the command line or in the makefile).

In source you can then do something like:

#ifdef SOME_FLAG_NAME
    [cool stuff]
#endif

And the [cool stuff] will only be compiled if the SOME_FLAG_NAME is set.

Edit:

And you can use -D SOME_FLAG_NAME=some_value to define a specific value.

jjmcc · August 10, 2023, 9:58pm

I interepret what you are doing here to be ‘Test to ensure deep infrastructure hacks don’t break TADS fundamentals.’

A cursory search for the opposite ‘Test to ensure your GAME is not broken by newly modified Verbs/Classes’ did not yield anything. Has anyone implemented anything more sophisticated than the braindead two-step:

“Testbench” build, that reads, autoplays and dumps transcripts for a Golden Testsuite, then
commandline script to compare (diff) test transcripts to Golden ones?

EDIT: Mea culpa. I posted without fully digesting the OP. When the synapses finally closed, I see that what jbg describes is so close to what I posted to not be worth talking about differences! Also, would not have called it ‘braindead’ if I thought I was talking about anyone but me… As penance, will def check it out!

inventor200 · August 10, 2023, 10:05pm

WHY HAVE I NEVER USED THIS BEFORE. OH MY GAWD.

Dannii · August 10, 2023, 10:15pm

RegTest works with TADS. Not to diminish the advantages of an in-system test framework.

jbg · August 10, 2023, 11:28pm

Yeah, like I said I think most people (or at least most people in the union of the sets “people who write TADS3 code” and “people who care about regression testing”) will probably want a simpler solution.

I started out with a little test game project that I’d use for testing parser changes by tweaking the project whenever I wanted to test against some new module/change. And occasionally (if I was writing a module and wanted to include the test cases as demo code in the module) I’d just cut and paste the game into another project to test “in place”.

And that works fine if you’ve got one project you want to test. But when I was looking at simultaneously wanting to do testing on three modules in various combinations/configurations…suddenly it seemed to make more sense to make the test cases a module that gets imported into the project instead of the other way around.

If you’re not looking at wanting to do testing…specifically a bunch of identical or very similar testing…against a bunch of different codebases…then this probably isn’t a particularly interesting solution to the problem.