I’ve tested a demo with 4 official testers (and received testing-related feedback from a few other people who were not trying to test my game at the time). Certainly I would not advise fewer than 2 unless there are serious recruitment problems. While the demo wasn’t large, the full game will be, and my philosophy means that the full game testing is likely to be the same but bigger except where specified.
I’d recommend having an iterative test system: start the first cycle as soon as you have something that doesn’t embarrass you (the technical term for this is a Minimally Viable Product). Even if you’ve only done part of the game, what’s there can be tested so you’re working on solid foundations. Then have plenty of cycles (be prepared to commit at least a month at the end the development process, so you have time to have at least 2 consecutive/near-consecutive cycles*, plus allow time during development if you have significant changes in the dame). Each cycle, implement changes from the previous cycle’s results (you can program some of these in from early respondents or your own discoveries while waiting for other respondents to finish their testing). Bonus: this means the early part of the game (likely to be seen the most) will have been tested a lot by the time the entire game is finished and tested.
- You want 2 consecutive/near-consecutive cycles at the end because it means people who are late getting results in from cycle 1 can do so and still influence your build. This also helps make the end feel a little less crunchy, if only because your main coding sprint now has to happen 2 weeks earlier and you’ve given yourself a bit more time to breathe and enjoy the fact you have a complete new segment.
If anyone hasn’t sent me responses of their current test build by the point the next one’s ready, I immediately point them to the new build so they can test the latest version if they wish (but continue to accept responses about the old one). So far, everyone in that situation has accepted.
For the 3 demo test cycles, I simply asked the people I wanted to test it, all of whom had already expressed interest in playing my game when it was ready. Further cycles will probably involve looking for testers in a broader field for a variety of reasons.
So far, my in-cycle retention percentage is 100%, partly because I don’t require anyone to commit to more than 2 weeks at a time. It means anyone who’s busy in the moment can easily know if they are too busy, and means they know it’ll be a fairly bite-size task. Nobody needs to do every cycle of a game’s testing, except the author (if there’s only one author - people involved in writing multi-author works can share the duty of co-ordinating testers). I also take the approach that if the feedback comes later, that’s better than not at all (although if I have a deadline, I also inform the testers of this).
I made no especial attempt to screen for particular lenses, but expect that will occur later on. However, I did try to have people with different levels of experience with the type of game I was making - ranging from someone who was well-versed in the game engine and subject matter of the game, to someone for whom this was their first computer game in 40 years. (NB: you don’t need to have someone who hasn’t played a computer game in decades on your team, but if you happen to find one, make sure they have appropriate support in the testing process. This may include being with them when they do the test “for technical support” and assuring them that their feedback is valuable by showing where their input improved the game).
I always emphasise to testers to tell me if there are any bugs or confusing parts of the game. While I know my writing tends to be good from a technical standpoint, I’m also good at missing fragments of code needed, which means my code can break. The fact that Ren’Py (my development engine) doesn’t have an automated test facility (as far as I know) and my inexperience mean I’m prone to serious bugs entering alpha testing and sometimes beyond. I also don’t always pitch explanations well enough without assistance - the tester who hadn’t played a computer game for 40 years is the reason there’s instructions on the first screen of my game after pressing “Play”.
In addition, I like to have one more focus for each cycle. For the Budacanta demo, Cycles 1 and 2 were “look for things that are terrible and let me know (though note I am already aware the art is not this game’s strong point)” and cycle 3 was “music” (because I coded that in late on and every other change from cycle 2 was a bugfix). I don’t worry too much about phrasing in general, although if I know a specific tester benefits from careful wording of a desired outcome, I will mention it.
I didn’t provide a walkthrough because the nature of the demo is that there are several valid paths and very few “invalid” paths. The next phase gets more complicated and I imagine some sort of guidance on good paths will be necessary.
In addition, I like to be present for one of the test runs in each cycle (while ensuring I am absent for other test runs) because presence and absence can change what one discovers. (Also, some testers really benefit from the author’s presence, simply from a confidence perspective).
I’ve so far had to rely on narrative feedback (verbal and email/direct messaging) because Ren’Py has a history function that I don’t know how to use yet.(a properly-working Ren’Py history works much like a parser transcript). I list comments in a text file, sort them according to perceived importance and ability to fix while respecting the rest of the game. As items are fixed, I move them to the “info” text file, which among other things lists the game’s version history. I don’t manage everything (and if anyone does happen to know of an easy-to-implement Linux installer, please let me know ) but I like lists. If I decide I can’t fix something, it stays on the list. If I decided I wouldn’t, then I would take a second look to see if the complaint is actually a manifestion of another issue I was willing to fix. Some problems are only problems in context.
Finally, and this is a fairly niche point, make sure you know your engine’s error reporting system well enough to understand error reports in any language in which your game is currently written. It’s a lot easier, for example, to parse an Indonesian tester’s error report if you can tell from a screenshot/error output what “Bengeculian telah terjadi”* means due to how the screenshot/error output is presented. If you can’t do this, it’s probably best to hold off on testing that language until you do know how your engine presents errors so you know which bit’s the error type, which bit’s the place where the detail coding of the error happens, et cetera.
- For the curious, “Bengeculian telah terjadi” means “A runtime error has occurred”.