Latest on unit testing in Inform 7

jeremydouglass · March 16, 2020, 11:11pm

I’m trying to come up to speed on current resources available for Inform 7 testing. I looked at some past discussions: [1] [2] [3].

Beyond using the skein from the Inform IDE, I am interested specifically in unit tests.

What isn’t working:

It looks like Simple Unit Tests by @Dannii Willis stopped working sometime after 6G60 due to internal changes in how rulebooks work.
Automated Testing by Kerkerkruip by Mike Ciul includes Simple Unit Tests, so it is also non-functional.

What I’ve found that seems to be working (although I haven’t experimented yet):

Unit Testing by @Natrium729 (unstable so not recommended for use, reference only)
Unit Testing and Checkpoints by @peterorme
Object Response Tests by @Juhana Leinonen
Command Unit Testing by @xavid
i7Spec (for Glulx only) by @Jeff_Nyman (inspired by Command Unit Testing)

My questions:

Is anyone currently using one or more of these testing frameworks in authoring, or teaching with them?
Does anyone have information on what is / isn’t working with the latest Inform 7?
Are there any extensions / tools that I’m missing?

Natrium729 · March 17, 2020, 12:46am

I’ve written a unit test extension for my own use (available at https://gitlab.com/Natrium729/extensions-inform-7). But it’s quite barebones and I’m modifying it regurlarly based on my own needs. Its documentation is very out of date too, if I recall correctly.

So I don’t consider it stable at all and I don’t recommend using it, but it’s there anyway if you want to take a look.

xavid · March 17, 2020, 1:21am

I use Command Unit Testing for my own needs, for my other extensions and some games I’m working on. It works fine except for some weird edge cases like disambiguation questions. Unless I missed some Inform 7 activity recently, it’s only been used with the latest Inform 7.

Dannii · March 17, 2020, 2:18am

You can also use Zarf’s RegTest. It’s perhaps better for larger integration tests than small unit tests, but for a lot of testing it’s a simple solution.

https://eblong.com/zarf/plotex/regtest.html

Jeff_Nyman · March 17, 2020, 9:59am

To your question: “Is anyone currently using one or more of these testing frameworks in authoring, or teaching with them?”

I still do use i7Spec in some of my classes. But …

The challenge I’ve found is that a unit testing tool needs setup and teardown to make sure state doesn’t leak between tests. I was looking for ways to add that in i7Spec but eventually gave up. You could try to hook up an undo-style mechanism but then that depends on how many actions you took in the test.

The challenge is also that unit testing isn’t really what’s needed for interactive fiction. What you need is integration testing that shades into acceptance testing. The latter being that you actually want to test for the behavior the game generates, which is what the player will experience. That makes it an acceptance test. But to do that requires integration: all the parts working together, changing state as needed.

I did a class with testers and developers on testing interactive fiction (which is where the i7Spec you referenced came from). And the challenge was this…

An acceptance test for Trinity might be, say, Solve Umbrella Puzzle.

That then requires the player going to Flower Walk, getting the soccer ball, going to Lancaster Gate, waiting for the umbrella to be blown into the tree and then throwing the soccer ball at the umbrella. Along the way, of course, you want to make sure all the text that should be displayed is displayed, including any variations.

That is the kind of testing you ultimately need. You want a test that runs through that scenario, perhaps performs multiple assertions, and then provides one verification at the end, indicating success or failure. (Note: acceptance tests also require no leakage of state. Good practice is also not to rely so much on teardown but rather have each test provide its context, or setup.)

I was going to work on i7Spec more but I’m now wondering if a form of Zarf’s RegTest might be a better approach, ultimately. I need to look at that.

zarf · March 17, 2020, 2:42pm

I wrote RegTest because I decided that no test framework written in game code could be reliable. Inform isn’t built for software modules to be isolated. Including the test framework has too high a risk of changing the game behavior. Particularly, as you say, over multiple test runs.

The I7 skein is the right approach – it runs the game and observes responses from the outside. However, I found it unwieldy to work with – I wanted to work in text files, not a GUI. Also, the IDE I was using (Mac 6G60) was slow on large skeins and I think it once corrupted the skein data. So that was out.

What I found in practice was that I needed three kinds of tests:

When you customize game grammar, you want regression tests on that. For example, if you spend a lot of time fiddling with Understand lines for a particular object or verb, you want to make sure the results are stable. (It’s easy to break one phrasing when you’re adding another one.)
When you design a puzzle, you want to test the solution(s) and all the major failure paths. This was a big one for Hadean Lands – I have a test for every way you can screw up a ritual. (Remember that puzzle failures are supposed to be enlightening for the player! So if one stops working, your game becomes harder.)
You want a beginning-to-end game run. (You feel really stupid releasing a game that’s unwinnable.) It’s handy to include SAVE commands in the run, so that when you release, you generate a collection of save files at each major plot point. (This makes it easier to test player bug reports.)

zarf · March 17, 2020, 2:53pm

Oh, and make sure your beginning-to-end run works on the release version of the game. The others will generally work on the debug version – you need debug commands to jump into puzzles or purloin objects for a given setup.

Jeff_Nyman · March 17, 2020, 3:12pm

Yes, I like that breakdown. In general, I found similar things. Essentially in the testing world you break things down into “edge-to-edge” tests and “end-to-end” tests.

The end-to-end would certainly be a path through the whole game. Not just “happy path” but the possible “unhappy paths.” Here I don’t mean “deaths” but any ending paths through the game where the story is brought to some form of conclusion. This is no different than basis path testing in the testing world.
End-to-end can also encompass all things that might occur in the context of, say, a given scene. In this context, you need to know all the things that can play out during that scene and thus the basis paths through the scene. This allows you to test the gating criteria for scenes to begin and end.

Then you get into lots of various edges, puzzles being one aspect of those – especially interconnected puzzles, where the solution to one puzzle requires solving another puzzle. Sometimes there can be multiple solutions, all of which have to work. Each puzzle thus serves as a type of edge and thus an interface. And that allows you to think about contract testing in the context of interactive fiction.

Another aspect, and a key component for my classes, was varying descriptions. For example, we used my own decaying descriptions ideas ([I7] Room Descriptions That Slowly "Decay") plus distantly viewable/visible extensions. And that meant we had a lot of text that was presented quite situationally. Authors wanted to make sure that text appeared as expected. So simple tests that would trigger the “description decay” and then assert on the text returned were crucial for them.

ChrisC · March 21, 2020, 10:13pm

Also, if there’s multiple endings, you need to test that they’re all reachable as well.

jeremydouglass · March 21, 2020, 10:27pm

@zarf – how would you contrast using regtest with your plotex tool – did you end up moving from one to the other, or did you keep using each in different ways on the same project(s)?

zarf · March 22, 2020, 4:06am

I used plotex for planning the game design and regtest for testing the implementation. Different stages of the project.

Zed · May 8, 2023, 4:15pm

I have gotten some good use out of my test extension:

I recommend regtest for testing gameplay, but if what you’re doing the detail work of testing individual phrases, I think it works pretty well.