Actually it is quite easy to do with few simple unix commands. My testPlay.sh script:
frob -i plain -k utf8 --no-pause -S -c -e 1000 -R testCommands.txt asteroid.t3 > testOutput.txt
diff -Nur testTranscript.orig testTranscript.txt > testTranscript.diff
and file testCommands.txt starting with “>script on” and “>testTranscript.txt” and continues with commands walking through the game. File testTranscript.orig containing last copy of game output considered correct and I copy testTranscript.txt to testTranscript.orig any time I’m satisfied with new changes. Although very simple in nature it is quite powerful technique. It allows me to see changes in the transcript really quickly and react on them immediately. Combined with ability to undo a move I can try several possibilities in every situation.
However there are some caveats to consider. As game typically have randomized output to some degree (such as shuffled event lists and such) which generates changes all the time as game is extended. One must learn to ignore these changes which constantly are shown in resulting diff. Few times I was scrolling through diff so quickly that I overlooked important changes :-/
Still if I should name one thing made me successfully complete my game (and TADS translation into my native language in the first place) it is definitely the test driven development. Text adventures are such a good match for this technique!
In Windows Workbench this is built in; it automatically records scripts as you play and you can tell it to replay any scripts it’s recorded and saved. I have a number of test scripts for adv3Lite, which helped catch a number of bugs before the last release. Unfortunately, even if you’re developing a game you can’t anticipate everything a player might do; there are just some things that won’t occur to you (which is why there’s no substitute for beta-testers). That’s even more the case when you’re developing a library.
I don’t exactly mean transcript testing more more like TADS code that simply tests other units of TADS code, by calling into it and checking the direct output. A suite of small routines that test specific functions/classes independently.
There’s nothing to stop anyone writing such a suite of routines if they seem useful, and no doubt they could be used to test certain mechanical aspects of a game or library, but I suspect this approach may be of limited usefulness in IF where problems can arise not just from coding errors in individual routines, but from unexpected interactions between different parts of game code and/or library code and, even more at the game level, from unexpected player input. A test suite may catch some coding errors but it won’t tell you if your game will make sense to players from the point of view of world-model or narrative logic, for example, and in practice it’s often these aspects of a work of IF, rather than the purely mechanical ones, that it’s hardest to get right. Writing IF is really not much like writing a stock control, payroll, invoicing or other such routine business-type system where both the data inputs and data outputs are fairly routine and predictable in nature (even if they can get quite complex in practice). Thus, for example, ensuring that various subsystems work properly in a piece of IF is no guarantee that the piece will work as a coherent whole - artistically as well as technically.
Coding for IF is also unlike coding for most applications in that so much game code is likely to be ad hoc, because a skilled IF author isn’t just bolting together ready-made bits and pieces from whichever system or library s/he’s using, but creatively adapting them to do new, different and unexpected things to entertain or puzzle the player, and what usually catches game authors out in such situations isn’t the player inputs they anticipated and catered for but the player inputs they didn’t anticipate. You really have to have something you’ve written played with by a bunch of good beta-testers to see what I mean.
So, while in a sense you could do the kind of test-driven TADS development you describe, I doubt it would suffice to produce what players would perceive as a bug-free game.
I don’t think the intention would be for IF authors to write absolutely bug-free, or as you put it, perceptibly coherent games; but instead to ensure the engineering coherence of specific units of library code. To be certain that the library’s constituent parts behave as advertised in the hopes that when they come together for more complex interactions the chances of errors like the one outlined in this thread are reduced. And that’s not even a dig on what you’re doing, just pointing out there is a technique to mitigate these kinds of bugs specifically.
But this is precisely what I’m not so sure of. The bug report which started this thread is precisely due to an interaction of different parts of the system under unexpected circumstances, and dependent on previous states of the game. I suspect it would only ever occur if the player entered a CONTINUE command without having previously entered a valid GO TO command in the course of the game. This wouldn’t be caught by putting in different test inputs into Continue.execAction() without also varying other aspects of the game state (in the case the value of a property on some other object). In fact, what Continue.execAction(cmd) does is hardly, if at all, dependent on the value of the cmd parameter but almost entirely on the state of the game resulting from previously entered commands, so this particular bug would never (except by pure fluke) be caught by testing one routine in isolation. Again, to the extent that I should have anticipated the particular set of circumstances that triggered this bug, it would have been quicker and simpler to test for it by entering a CONTINUE command on the first turn of a game than by writing a special test routine.
So I’m still unconvinced that a testing technique that works well for payroll systems and the like is necessarily all that transferable to IF, or, at least, that it would do much to trap the most of the kinds of bugs that are coming to light at this stage of adv3Lite’s development (which are certainly no more numerous than they were for adv3 or Inform nine months after their initial beta release),
I don’t know how you can make the argument that a technique that tests the actual functionality of units of code isn’t an applicable measure of integrity of those constituent parts or how this concept doesn’t apply to all software regardless of the use-case. Sure, IF library code may interact in interesting ways with user code and other parts of the library and it maybe hard to test wide high-level interactions among those parts but those high-level interactions working is still predicated on the constituent parts working as expected. I mean are you suggesting that game companies don’t unit test their game-engine code because at some point high-level interactions become hard to test for?
As someone who has programmed both IF and payroll systems, and unit tested both, I can say that they are surprisingly similar.
One of the basic theses of TDD is that each unit test should be reduced to test only one thing. That doesn’t mean that the setup or the steps leading to that test can’t be as complex as necessary. In a complex system like payroll calculations the bugs are usually similarly convoluted: for example, you might get wrong results only when the employee has been employed for at least 3 years and has been on vacation for at most a month this year and the latest vacation started on Wednesday.
Also, it seems that your assumption is that unit testing/test-driven development is meant to catch bugs before they occur, or that you should write tests for everything you can do with the software. That is certainly one use case, but most of the time it’s used as a regression testing tool: after you find a bug you write the unit test for it to avoid the bug from ever coming back when you fix something else. I can heartily recommend unit testing for libraries precisely because after a certain point the bugs tend to occur in very specific circumstances, and by creating unit tests you can be sure it won’t come back later. In fact, I believe I7 has an extensive unit testing suite for the system itself and the standard library.