Best practices for authors in search of testers

kamineko · March 2, 2023, 8:57pm

Note: This thread is about games that, for whatever reason, require more time/energy/organization to completely test. Most games probably don’t require that, and this thread isn’t about experiences with not needing it.

As I have been in and out of testing my game for almost a year now, I feel that I am more familiar about being on the author side of playtesting that I once was, but still more clueless than I would like to be. As my game has grown, it has become increasingly difficult for me to do what I feel would be adequate QA.

I hope that we can identify the kind of best practices and guidance that Mike laid out in his thread about testers.

While Mike mentions copious advice for authors, I haven’t been able to find much advice about entering the playtesting phase of an IF project. What are the elements that lead to a successful beta? I can think of a few (perhaps there are others). Here are a handful (non-exhaustive) of things that I imagine as requirements for QA on a large IF game:

A team suited to the game in terms of interest, available time, and so forth.
A clear idea of areas of concern, if any.
A timeline, I think.
Depending on where the project is, guidance for completing the game or performing requested testing.

Meanwhile, some questions about practices and expectations:

On average, how many testers that sign up complete the testing process? Since many tend to close a request for testers after reaching a certain threshold, this is something to factor in. Attrition is a natural part of the process, since people have lives outside of IF. This is a hobby, things come up, priorities change.
When requesting the testing of certain features, how do you phrase your requirements?
How long do you usually leave testing open? In my case, after a month of active development the story file isn’t terribly useful (I’ve been fixing the stuff that has already come in).
Do you take special steps to find testers (not just making a general thread on this forum)?
Do you track issues in an organized way?

Finally: what other suggestions do you have for managing the testing of a complex or large IF project? I’m not prepared to quantify “complex” or “large,” but I think that if you’ve had to do this, you’ll know it.

In my case, I’m led to ask these questions because my story file is quite large, and I feel that I have not done a good job of managing my quality processes.

mathbrush · March 2, 2023, 9:25pm

I thought I could share what my experience has been with testing. I generally have the ideal of 20 testers for each game (which I have never reached, usually between 8 and 14) and of spending roughly equal times writing and testing (sometimes testing is like 50% of the time. And I’m talking about calendar days rather than author-days, the testing period is often a time to relax from the project).

Here are a handful (non-exhaustive) of things that I imagine as requirements for QA on a large IF game:

A team suited to the game in terms of interest, available time, and so forth.

I’ve found that the genre tends to matter surprisingly little in terms of who signs up and completes testing. The system matters, I think (some don’t feel comfortable testing parser due to unfamiliarity).
Time definitely matters. It’s way harder to find people to test exceptionally long games (3+ hours). I’d say a 15 minute game can get twice as many testers as a 2 hour game.

A clear idea of areas of concern, if any.

I’ve found it helpful to say what I’m focusing on. I get a lot of typos in my writing, even with spellcheck and pasting output into grammarly, so a lot of my first-run testers tend to focus on grammar. This is great for me (because no one wants to play a game with typos) but if I wanted more general feedback and didn’t want the focus on the text yet (because, say, I might rewrite it later) I’d have to specify that.

A timeline, I think.

Whenever people ask for a time frame, anything less than a week tends to seem stressful and anything over 2 weeks often leads to people forgetting about it.

Depending on where the project is, guidance for completing the game or performing requested testing.

Does this mean like including a walkthrough or just explaining how to download it and run it? Because both are good. Including a walkthrough is really important (if you’re not doing randomized stuff) because if you don’t it’s very likely the tester will get stuck early on on something you never anticipated (like in a game of mine where I implemented ‘cup’ but not ‘glass’ so they thought the glass wasn’t there). Then testing screeches to a halt and they have to email you. If you have a walkthrough you know works (through inform testing commands or similar), it makes for better feedback.

On average, how many testers that sign up complete the testing process? Since many tend to close a request for testers after reaching a certain threshold, this is something to factor in. Attrition is a natural part of the process, since people have lives outside of IF. This is a hobby, things come up, priorities change.

If I ask for a test with feedback within 1-2 weeks, usually 80% of people respond in that timeframe, in my experience (and the ones who don’t always have a good excuse!)

When requesting the testing of certain features, how do you phrase your requirements?

Here is a recent example I sent out about a year or two ago:

The game is somewhat polished but has a bunch of placeholder text and missing synonyms. It’s my first game in Dialog, so I’ve probably made several mistakes.

But I’m looking more for overall structure comments. What does it feel like the game is missing overall? Should I add or take away puzzles or people? That kind of thing. What moments worked well, what felt flat? Then I can make bigger changes based on your feedback.

There are no hints and alternate puzzle solutions aren’t coded yet, so I’ve included a walkthrough. Thanks!

How long do you usually leave testing open? In my case, after a month of active development the story file isn’t terribly useful (I’ve been fixing the stuff that has already come in).

I like to do around 3 phases of testing for a small project.

I like to make a very basic prototype and have someone try it to see if the overall concept works (so like 30% of the game is complete, it’s just a skeleton sketch). For instance, I sent Chandler Groover a very early version of Grooverland where you get trapped in a kind of frozen stasis hell forever, and he recommended not doing that.
Then I like to do a beta test once content is complete to see if any individual puzzle needs changing. For instance, in Impossible Stairs, several players wanted cooking to be less boring and Grandma to be more interactive, so I added a grandma-based puzzle.
Then I do a final run to catch bugs and typos

Do you take special steps to find testers (not just making a general thread on this forum)?

In 2016 I was obsessed with winning IFComp. As part of that I messaged every IFComp winner from 2011 to 2015 asking them to test my game. Some were unavailable (like Ryan Veeder) but Sean Shore/Mr Patient was phenomenal, as was Marco Innocenti. I wouldn’t recommend this approach (because being a great author and great tester aren’t super correlated), but messaging directly did work in that instance. (If anyone messages me, I used to be a good tester but now have so many projects I usually end up disappointing people).

I’ve occasionally gotten testers that have no IF experience to test out newbie-friendly things like tutorials.

Do you track issues in an organized way?

No.

Finally: what other suggestions do you have for managing the testing of a complex or large IF project? I’m not prepared to quantify “complex” or “large,” but I think that if you’ve had to do this, you’ll know it.

I’d recommend having only a couple of testers at a time rather than many at once, since they will tend to find the same errors and be redundant if they work in parallel. I also recommend prioritizing new testers over repeated old testers so you can get new viewpoints on the material (unless you added new sections, then it makes sense for old testers to check it out).

Those are just my thoughts!

kamineko · March 2, 2023, 9:50pm

Thanks Brian, this is all super helpful. I think game length has been an issue for me. I’ve said that it’s long, but I should probably be more specific.

That makes sense. I’ve been giving people a month. It’s hard to gauge due to length.

I should have been doing this all along, but I’ve got one available now.

That’s much better than my batting average! I think it probably comes down to me needing to be more clear about play time. It doesn’t help that the story can be dark.

That makes sense! I’ve been doing it in bigger chunks and I think it’s less efficient.

I appreciate these suggestions!

Denk · March 2, 2023, 10:21pm

Regarding walkthroughs I sometimes find that testers may look at the walkthrough too early if I supply it. First when the testers get stuck will they start to be creative. But they can always ask for help. However, if the deadline is fast approaching this might not be a good idea.

kamineko · March 2, 2023, 10:27pm

I guess it depends on what quality you’re trying to test. Ability to complete is one thing. Whether a puzzle is adequately clued, or appropriately difficult is another. Importance depends on the game, I think.

My project is late enough in the process to have in-game hints. I think I would have given them hints in early testing, too, if I had thought of it. Just because of the kind of game it is.

StJohnLimbo · March 2, 2023, 11:04pm

Just wanted to repeat this for emphasis. Use automated playthroughs to make sure that the game is at least winnable (for example, with zarf’s RegTest tool, Inform’s “TEST ME” facility etc.).

This ties into the general idea that the beta test will be most useful when one doesn’t use up the testers’ time, attention, and energy for things which one could have found oneself.

I made a simple spreadsheet with the columns:

“Issue Number”: an ID for reference, e.g. in git commit logs
“reported for”: the game version in which the issue appeared
“fixed in”: the game version in which it was fixed (if so)
“description”: summary or quote from the tester’s report
“status”: fixed or not (this might be thought redundant, due to the existence of the “fixed in” column; but one could also want to record a status like “not a bug” or “won’t fix” here)
“priority”: just one of a few numbers (in fact, 1 and 2 was enough) so I could order the table accordingly
“reported by”: by whom the issue was reported (useful if one wants to ask followup questions, or especially notify that particular person that the issue was fixed)

I could also have used a private Github repo with the accompanying issue tracker, but it wasn’t really necessary; local repo plus spreadsheet sufficed.

Emily Short has some advice which I think pertains to games of all sizes:
Suggestions for Testing – Emily Short's Interactive Storytelling and Preparing a game for testing – Emily Short's Interactive Storytelling.

kamineko · March 2, 2023, 11:13pm

Yeah, using “test” in Inform has been very important/valuable for me. I even made “test” commands for major game milestones and handed them to the testers. Seemed like a nice convenience! I commented out all of the “press any key to continue” prompts just to keep things moving. It felt reasonable to help them get from area to area after playing through.

This spreadsheet seems like the best approach. Thanks for the example! I’ll definitely check out those Emily Short posts. I appreciate the links.

rileypb · March 2, 2023, 11:42pm

I put a DEBUG is true line in the Not For Release section, and surrounded any "press any key"s with if DEBUG is false. This is super handy when using the skein as well.

nilsf · March 3, 2023, 2:13am

Perhaps I’m just a sadist who likes being cruel to my testers, but I feel like not giving a walkthrough is helpful in finding under-clued areas. If a player is stuck, peeks at the walkthrough and thinks “Oh, that’s fair in retrospect” they might not mention it whereas if I get a message asking for a hint I know that I probably need to add more cluing.

pinkunz · March 3, 2023, 3:46am

I am going to briefly drop a quick thought and bow out.

Identifying and snagging (as opposed to turning away as unneeded) backup testers to have in the wings beyond what you need is probably a good idea. This allows you to feed folks in as other folks drop out for various reasons. Also, letting your active crop of testers know that, one, you have backup testers and, two, that trying to cram in this commitment if the quantity of plates being balanced changes isn’t necessary nor desirable, are both probably good ideas as well.

[Exits stage right.]

kamineko · March 3, 2023, 3:48am

I’m glad to have your thoughts!

severedhand · March 3, 2023, 7:50am

I’ll make a couple of comments related to ongoing topics here, then summarise my own process.

People who need people

Summary

I think you have to modulate everything by discovering how each tester actually tests, which you only start to find out once they give you their first transcript or email. This is truer for a big game like yours than a shorter one, because the overall process will be longer.

With a small game, you can hurl a bunch of different-profiled darts (your testers!) at it and get a good result, hitting most of the board. With a big game, you may realise some people can only hit certain areas (in fact, perhaps they can ‘destroy’ those areas) but some parts of the larger board aren’t being hit. You may need to find more testers to try to address these areas – but that’s only if you find you can’t cover the board in total.

Some people are great at giving clear first impressions. Some are great at attacking all possible holes. Some are good at many things, and can put on whichever hat you ask – if you remember to ask. And some are great at a certain thing every time, but uninterested in other angles. And you don’t know who’s who until they start testing.

People who need walkthroughs

Summary

I’ve never given anyone a walkthrough of any IF of mine. Ever, actually? Each game or bit of game is only new once, so I want to get testers’ fresh reactions to it. This works okay for me because I usually have a pretty solid iteration before I share it. Therefore I can see why this may not work for everyone, and I also don’t have such puzzly games as, say, @mathbrush

My own testing summary

Summary

I’ve usually got between 4 and 6 testers on a game, meaning @mathbrush 's abundance makes me embarrassed

Once I have a major iteration of the game, I send it to everyone who’s on board. On their first trip, I don’t say much except to record a transcript and note their reactions in the transcript. If the game has got some weight somewhere (whether positive, or buggy or negative or nonsensical) I want them to tell or show me where it is. I don’t want to point at anything until I’ve had first impressions.

I then go through all the transcripts line by line and address everything that’s come up. These could be bugs at specific points in the transcript (or things that the player doesn’t even know is a bug or that I don’t like – but that I need to fix anyway) and I copy paste that line, then and a summary of my action or reaction below it. Once I’ve acted on it all, I send each person a summary of what I fixed from their transcript, with maybe some commonalities listed at the bottom.

I may not go this detailed on every single round, but if I do it on the first round, people can see how everything they do is of value, and how they’ve actually changed the game.

With all the fixes in and some more test playing from myself, I go to the next round. If there’s a particular system that seems troublesome, I might have everyone, and/or someone really experienced (@aschultz ) bash at it. If there’s some emotional thing or idea nobody commented on that I wanted comment on, now I’ll start asking.

Otherwise, I just keep iterating, sending, reading transcripts, responding, iterating, etc. There’s no definite schedule. When each round is done, I send out the next one. Some people may come or go, and that’s fine.

Edit - Oh yes, do I track anything? These documents I make after each iteration are a kind of tracking. Otherwise, if something’s for later, I drop a note at the top of the game source as a comment, that I use as an important-things-to-add/fix list.

-Wade

The_Pixie · March 3, 2023, 12:19pm

Someone already said it, I recommend just two new people at each iteration so there is not too much repetition of effort, but also testers may only be prepared to play the game once, so keep some back to ensure there are still testers for your last version.

With regards to tracking, I just keep a text file that I copy and paste testers comments into, and delete them when I have resolved the issue. It is very simple, but does the job.

aschultz · March 3, 2023, 7:24pm

I really endorse this too!

As a sort of side note, I also wrote a modified stub to allow for answering yes/no questions automatically. The TEST command doesn’t do this.

to decide whether the player direct-consents:
    if yes-no-val is 2, decide yes;
    if yes-no-val is 0, decide no;
    if the player consents, yes;
    no;

Where yes-no-val can be changed with a command e.g. “yesno 1” or “yesno 2”.

Also, I recommend a github repository for tracking issues even if you don’t use source control. Just having them there is super-handy.

Alianora_La_Canta · March 3, 2023, 8:29pm

I’ve tested a demo with 4 official testers (and received testing-related feedback from a few other people who were not trying to test my game at the time). Certainly I would not advise fewer than 2 unless there are serious recruitment problems. While the demo wasn’t large, the full game will be, and my philosophy means that the full game testing is likely to be the same but bigger except where specified.

I’d recommend having an iterative test system: start the first cycle as soon as you have something that doesn’t embarrass you (the technical term for this is a Minimally Viable Product). Even if you’ve only done part of the game, what’s there can be tested so you’re working on solid foundations. Then have plenty of cycles (be prepared to commit at least a month at the end the development process, so you have time to have at least 2 consecutive/near-consecutive cycles*, plus allow time during development if you have significant changes in the dame). Each cycle, implement changes from the previous cycle’s results (you can program some of these in from early respondents or your own discoveries while waiting for other respondents to finish their testing). Bonus: this means the early part of the game (likely to be seen the most) will have been tested a lot by the time the entire game is finished and tested.

You want 2 consecutive/near-consecutive cycles at the end because it means people who are late getting results in from cycle 1 can do so and still influence your build. This also helps make the end feel a little less crunchy, if only because your main coding sprint now has to happen 2 weeks earlier and you’ve given yourself a bit more time to breathe and enjoy the fact you have a complete new segment.

If anyone hasn’t sent me responses of their current test build by the point the next one’s ready, I immediately point them to the new build so they can test the latest version if they wish (but continue to accept responses about the old one). So far, everyone in that situation has accepted.

For the 3 demo test cycles, I simply asked the people I wanted to test it, all of whom had already expressed interest in playing my game when it was ready. Further cycles will probably involve looking for testers in a broader field for a variety of reasons.

So far, my in-cycle retention percentage is 100%, partly because I don’t require anyone to commit to more than 2 weeks at a time. It means anyone who’s busy in the moment can easily know if they are too busy, and means they know it’ll be a fairly bite-size task. Nobody needs to do every cycle of a game’s testing, except the author (if there’s only one author - people involved in writing multi-author works can share the duty of co-ordinating testers). I also take the approach that if the feedback comes later, that’s better than not at all (although if I have a deadline, I also inform the testers of this).

I made no especial attempt to screen for particular lenses, but expect that will occur later on. However, I did try to have people with different levels of experience with the type of game I was making - ranging from someone who was well-versed in the game engine and subject matter of the game, to someone for whom this was their first computer game in 40 years. (NB: you don’t need to have someone who hasn’t played a computer game in decades on your team, but if you happen to find one, make sure they have appropriate support in the testing process. This may include being with them when they do the test “for technical support” and assuring them that their feedback is valuable by showing where their input improved the game).

I always emphasise to testers to tell me if there are any bugs or confusing parts of the game. While I know my writing tends to be good from a technical standpoint, I’m also good at missing fragments of code needed, which means my code can break. The fact that Ren’Py (my development engine) doesn’t have an automated test facility (as far as I know) and my inexperience mean I’m prone to serious bugs entering alpha testing and sometimes beyond. I also don’t always pitch explanations well enough without assistance - the tester who hadn’t played a computer game for 40 years is the reason there’s instructions on the first screen of my game after pressing “Play”.

In addition, I like to have one more focus for each cycle. For the Budacanta demo, Cycles 1 and 2 were “look for things that are terrible and let me know (though note I am already aware the art is not this game’s strong point)” and cycle 3 was “music” (because I coded that in late on and every other change from cycle 2 was a bugfix). I don’t worry too much about phrasing in general, although if I know a specific tester benefits from careful wording of a desired outcome, I will mention it.

I didn’t provide a walkthrough because the nature of the demo is that there are several valid paths and very few “invalid” paths. The next phase gets more complicated and I imagine some sort of guidance on good paths will be necessary.

In addition, I like to be present for one of the test runs in each cycle (while ensuring I am absent for other test runs) because presence and absence can change what one discovers. (Also, some testers really benefit from the author’s presence, simply from a confidence perspective).

I’ve so far had to rely on narrative feedback (verbal and email/direct messaging) because Ren’Py has a history function that I don’t know how to use yet.(a properly-working Ren’Py history works much like a parser transcript). I list comments in a text file, sort them according to perceived importance and ability to fix while respecting the rest of the game. As items are fixed, I move them to the “info” text file, which among other things lists the game’s version history. I don’t manage everything (and if anyone does happen to know of an easy-to-implement Linux installer, please let me know ) but I like lists. If I decide I can’t fix something, it stays on the list. If I decided I wouldn’t, then I would take a second look to see if the complaint is actually a manifestion of another issue I was willing to fix. Some problems are only problems in context.

Finally, and this is a fairly niche point, make sure you know your engine’s error reporting system well enough to understand error reports in any language in which your game is currently written. It’s a lot easier, for example, to parse an Indonesian tester’s error report if you can tell from a screenshot/error output what “Bengeculian telah terjadi”* means due to how the screenshot/error output is presented. If you can’t do this, it’s probably best to hold off on testing that language until you do know how your engine presents errors so you know which bit’s the error type, which bit’s the place where the detail coding of the error happens, et cetera.

- For the curious, “Bengeculian telah terjadi” means “A runtime error has occurred”.

HanonO · March 3, 2023, 9:10pm

Aaaare the luckiest people…in the woooooorllld…

[sorry, this poster has been sacked, we return you to the topic]

I had the most difficult time getting testers, especially for parser games right before a Comp and that was my fault due to the way I work. Parser games, especially long ones or one with complicated mechanics, really require hours and hours (actually months ideally) of testing due to the open-world/physics engine nature of the model game world. And in a comp, authors are often scrambling to finish up so testing the complete or almost complete game often gets short shrift.

Part of the problem is right before a comp deadline, everyone who wants to test is already busy testing. Everyone who is an author is busy polishing their own game. Some good testers may be laying out because they want to rate and play the games and not test.

I found one of the best pool of testers is other comp entrants but you have to set this up early before the near-deadline crunch - if you can latch onto one or two other entrants early and make a mutual beta-testing pact and help each other, that seems to work the best when you’re under a deadline and need an extensive amount of play testing.

Lazzah · March 4, 2023, 6:15am

I just wish I could get as many testers for my games as other people seem to do! I post requests for testers but I am lucky if I get one or two replies. I wonder if it is the fact that I use ADRIFT5 to create my games and not one of the more popular systems such as Inform7, etc?

What annoys me is that when I enter my games in competitions, the reviewers immediately complain when they find bugs! I have considered offering to pay testers, but I cannot afford to do that in the current climate.

What am I doing wrong???

rovarsson · March 4, 2023, 6:21am

Good to know. Thanks.

severedhand · March 4, 2023, 6:33am

I suppose long form + old school + fantasy is probably a tougher than average combo to ask in general.

I can provide one data point from myself as a Mac user: I got burnt repeatedly over the years trying to play ADRIFT games, finding them crashing halfway through in the online player or seemingly have lock-up bugs. In that context, I played them less and less, and would certainly never invest in testing one. I have seen your test requests over time.

I assume this isn’t a factor on the Windows side, but the long, unvarying history of unreliability on my platform just made me give up. It would take a decent chunk of evidence that the games can be played offline with reliability and no crashing on the Mac to bring me back.

So for me, it has been the ADRIFT platform, but only in a technical sense.

-Wade

Lazzah · March 4, 2023, 7:40am

Hi Wade,

Are you aware of FRANKENDRIFT? This program has been designed for running ADRIFT games on MAC machines, it can be downloaded at https://github.com/awlck/frankendrift/releases/tag/v0.6.1

Give it a try!