Best practices for testers

DeusIrae · June 3, 2022, 2:25am

The first piece of advice on the IF Comp guidance for authors page is to playtest your games. But with Comp season coming up – and the concomitant testing requests – I realized that though there’s copious advice for authors (not to mention prospective judges), there’s not really much out there to help folks be good testers. So I thought it might be helpful to open up a thread where folks can share how they approach testing, or for authors, things testers have done that have been especially useful (or things to avoid), and hopefully create a useful resource moving forward.

(My other motivation here is that I while I feel like I’ve got a good handle on how to be a good tester of parser games, I’m less confident about how to approach choice-based games. So I’m hoping to get some advice here, especially on stuff like learning how, or whether, folks communicate the particular path they take through a choice-based game in the absence of built-in transcripts)

To get things started, here are some things that have worked for me (at least I think they have; if folks whose games I’ve tested have feedback for me!):

Per the above, if a game has a transcript function, make sure to use it! This is a no-duh, of course, but I’ve personally messed this up in a few ways, usually when forgetting to re-enable transcript tracking after reloading or restarting the game (in Gargoyle at least, I think the transcript will continue after typing RESTART for a Gluxe game but not a z8 one…) So these days I try to open up the file after a couple moves at the beginning of each session to make sure everything’s working.
When I have time, I usually like to do an initial playthrough more or less straight – mostly approaching things as a regular player, albeit one who likes to examine everything mentioned in room descriptions and try out lots of synonyms for parser games – but then a follow-up one or two that’s more focused on trying to break stuff: taking everything even if it seems nailed-down, cramming stuff into containers, stacking things onto each other (I had a lot of fun with this testing The Impossible Bottle), taking wildly suboptimal choices or ignoring clear prompts to do X Y or Z thing, to try to stress-test things. The theory of separating these two is that providing a sense of pacing and how the game flows can get really lost in the latter case.
Speaking of that, I try to provide feedback on big-picture issues like pacing! Usually in addition to sharing a transcript or detailed list of feedback (for a choice game), I write 3-5 bullet points summarizing my overall take on the game, and flagging areas where I think the author could consider making more significant changes. Often this is where things like structure, voice, etc. come in since it’s hard to address those in a granular way.
And then the flip side of the previous one is that it is of course helpful to provide that super granular feedback – I like to drop lots of comments, earmarked by *'s, in parser games, to flag typos, missing scenery items, buggy responses, etc., but also positive feedback where a joke lands or a puzzle clicks so authors know not to mess with something that’s already working. I also like to provide a bit of running commentary, like what I understand my current goal to be or whether I’m frustrated by some busywork or finding it no big deal to work through, since sometimes it can be hard to assess that kind of stuff just from looking at the transcript. For choice games, as I mentioned, this is often harder – I try to keep a text file where I paste in whatever chunk of the game I’m on and add comments from there, but honestly that can feel clunky.
This is a suggestion for authors, but I think it’s helpful when they provide specific prompts for where they’re looking for feedback beyond just “look for bugs and typos”; it helps focus my attention as I’m playing. The flip side of that is that sometimes alerting a tester to an issue makes it harder for them to assess it the same way a player who comes to it “cold” would, of courses – but I think the tradeoff usually cuts in favor of asking for the feedback that will be useful.
For puzzle games I really try not to rely on hints to the greatest possible extent – this isn’t always possible, but I find a good middle ground when stuck is to send a status update to the author lightly fishing for clues (“I made it to the inner cloister but now can’t get past this one ornery monk – I think I need to get him to inadvertently violate his vow of silence so he’ll give up the habit and let me through”), since that lets the author step in if there’s a bug or I’m wildly off base, or let me trundle along on my merry way if things are basically fine.
Lastly, I think it’s important to communicate clearly with the author on your timeline. Sometimes testers can’t get to a game for a couple days, or even weeks, which is totally fine – we’re all doing this for free (or so I assume!) – but it can be rough on authors not to know when to expect feedback, or whether they should get the tester an updated version reflecting bug-fixing they’re doing in the meantime.
For parser games specifically: examine anything mentioned in a location description, including using any adjectives mentioned. Then examine anything mentioned in those descriptions, until you run out. LISTEN and SMELL whenever it seems even slightly interesting to do so. Try any custom verbs on any object you can, especially inappropriate ones (in Inform at least, it’s easy to write an action that applies to more kinds of things than you’ve written responsive logic for). TAKE ALL whenever you can. Always X ME. Try to drop plot-critical items and leave them behind in inaccessible places. Try all the different potential conversation verbs (TALK TO, ASK ABOUT, TELL ABOUT, SAY…)

I’m sure there are lots of others – and lots of things wrong with the stuff I’ve listed above – but I’ll stop there since I’m curious what works well for y’all!

AmandaB · June 3, 2022, 3:24am

Also: be honest. I appreciate people trying to be nice in their criticism, but it isn’t kind to let big bad game design flaws go unchallenged. It’s one thing if the game just isn’t your cup of tea, but it’s another thing entirely if the gameplay or the narrative has big problems.

I always appreciate my testers pushing me to be better and telling me if I’ve got issues. Some of them (definitely looking at you here, Mike) will even help you brainstorm a little to fix the problems. So I think it’s important to say what you like and what you don’t like, and to be kind but honest about it.

aschultz · June 3, 2022, 4:07am

I think one thing you and Amanda both touched on is that you want to be positive without being deceitful and so I use a lot of stock phrases such as “this worked well, but this can be improved”. I keep remembering: I’m not here to pass judgement. That’s for the judges! I’m there to help give them something more likable to judge.

It sounds a bit unemotional, but on the other hand, emotion can get in the way of fixing bugs. I avoid overreaction. I know as an author it’s disappointing to let stuff I thought I’d fixed, or I know I should’ve checked, slip through. My task as a tester, as I see it, is to remember someone has shared something they suspect has faults, and then help improve that work as efficiently as possible, because testing is about how much the author wants to improve, not how clever I am noting what is wrong, or how persuasively I can sigh “it’s been done.” Also I note if someone makes the same sort of mistakes I might have made, and I say so, and I express confidence the author will straighten it out. This feels like a Golden Rule.

One other thing I like to do is have a game surprise me. I’ve made a critical comment, and suddenly I realize I overlooked something! I realize I may not have been careful looking at certain things. Or I get through part of the game and I realize I wish I had spent more time in another area, and the author should know that. To this I’d add–every time you make a criticism, note that it might only be worth changing if someone else finds something wrong. Note if it’s a pet peeve and also note that if you’re the only one making the criticism, it’s bad (Note: for obvious technical errors, there are no two ways. Report it and move on.) When a tester admits they were wrong, it flips the script and can make the author feel smart.

I’ve also come to grips with this: I can’t have a universal style. I try to be thorough, and I want to do more than just plow through, but there are things I am just not going to cover, whether it’s due to lack of interest in the subject matter or whatever. So I try to leave notes to the author to say “Hey, maybe a different style tester would enjoy checking this” or “this might be good to engineering-test.” In this vein I’ve also found open-ended questions to be useful. Maybe only 2 of 10 hit. But the ones that do are often useful. And I know as a programmer, I need to be asked some open ended questions more than once before I get a good answer. Sometimes questions testers ask are things I wondered if I should bother with, and then I say aha, yes, it was.

And a lot of times for technical stuff I’ll provide 2 suggestions: one, a quick and dirty solution, and two, something more detailed if the author has more time.

I know it stinks to forget to make a transcript, but here’s where I’d suggest a trick that works for inform, for the authors.

After reading a command (this is the ignore beta-comments rule):
    if the player's command matches the regular expression "^\p":
        say "(Noted.)";
        reject the player's command.

I suspect other programming languages have other catchalls. In the case of inform, you can also easily put this (and other code maybe describing what you want from beta testers) in an extension, and you then disable including that extension one week (or whenever) before release.

One of the big things I do is to have an email ready at the start of testing. Every time I make a comment I want the author to see, it goes into my email. I roughly try to sort things by expected boost to game for the time taken. As an author it feels good to have a flying start by fixing a big bug quickly. Ideally, of course, there are none, but fixing them always feels like a win, and I want to give an author as many quick wins as possible so they can get on a roll. I also try to provide a roadmap of what can wait for a post-comp release, but if the author doesn’t get to it, no problem. If they get ahead of schedule and slip it in, it feels good. But I want that sort of thing to be something you add so a post-comp release feels substantial.

So typos don’t get a mention (though I may say “search the transcript for the word TYPO,”) but a runtime error would be at the top of the list, especially if I know it’s the sort I’ve seen in my own works due to one careless line of code. Or being locked out of a room, or feeling locked out, or noticing intended hinting pushes the player away.

Logic/continuity errors where I try specific things, or internal contradictions in the story, are lower. I also try to look for big-picture stuff where the author probably needs to reorganize huge chunks of code and let them know. Maybe it’s something they’re aware of, but I also want to have it on record how complex I think it will be and how critical it is to me. I try not to mention typos in my email, because I’m not morally attached to them, but I do tend to find them.

Yes…speaking of big-picture, this sort of stuff is important for me to sit and think out. I’m good at finding out the technical side quickly, but a lot of times I just have a sense something could be better, and I don’t know how or why, and it’s hard to express, and I have an idea 2 days later. So I let authors know I may have technical stuff immediately and then subjective stuff later. Also, as you alluded to, it’s important to communicate with the authors how you work. I know if I’m going to get busy, I like to give them immediate feedback for stuff that might have low risk high reward, not to say “HA I FOUND A BUG RIGHT AWAY,” but so they have something to do and to show I’m thinking of stuff in good faith. I often try to add something that worked for me. Then I follow up with more detailed stuff later.

I also try and look for pieces with moving parts that might conflict and try to test them. Certainly I do what I can to get the game in an unwinnable state, and if a game survives my stress testing, I let the author know.

As an author I find getting too large a transcript at once has a risk of making me wait until I have a huge time chunk, and I actually appreciate them being broken into small chunks, even/especially for a long game.

On that note, if you feel you’re just plowing through, also maybe take a break and come back later. I get cranky and start complaining to myself about irrelevant stuff when I’m tired. Don’t feel you have to tackle the game at once. The author should have given enough time in advance.

I agree here. I try to let my thoughts leak out, and often I’ll find me correcting myself later. I sort of mentioned above it’s cool when the penny drops. That does feel clunky for choice games but it’s not too bad to hit ctrl-a and paste in a commentary.

I agree on this (more info is generally good) and generally try to balance one author suggestion with one thing I think would be tricky for me to code, so there’d be a lot of things to work on there. My wheelhouse is 1) change “You can’t go that way” to something that provides atmosphere and 2) be more helpful with parser errors or standard Inform rejects (e.g. YES and NO feel a bit snarky.)

As someone who knows inform pretty well I also think I have a good handle on when an author just doesn’t know certain code would be easier than they think, so I look out for that. Stuff like having a “not for release” section that does

every turn when hint-flag is on:
    try hinting;

So they can test the hinting code, if it seems to misfire more than once.

Also sometimes the author’s suggestions what to test provide clues as to a blind spot. If they say “Well I had someone test this thoroughly so don’t waste your time there,” I’ll listen. If I see something odd in an area they didn’t mention, I look at it more.

One other thing I’d add–it’s a nice touch to check what time of day the author can work at what they have. I’ve had a lot of fun waking up to an email saying “New build!” from someone in Australia and knowing I can wait a few hours and still give what seems like immediate turnaround–but of course they will wake up to a hopefully helpful transcript. Sometimes when I get a game I think “when should I test it” and actually narrowing it down to just before when the author could actually use it is a big help.

As for the bad side? I remember one tester who told me “This game suffers from what I call AGI-itis” and the thing was, I knew it was a dry goods game, and I was willing to accept that as a weakness, but that wasn’t the focus. This wasn’t a huge slap in the face, but it felt unnecessary and obtuse. A vernacular way of putting it is, stick to the facts when you can, but don’t be all Mr. (or Ms.) Actual-Factual.

Nathan · June 3, 2022, 4:30am

I’ve been unable to figure out what “AGI-itis” and “a dry goods game” might mean. It’s clear that you found the tester’s comment unnecessarily insulting, but I don’t get it.
I don’t feel I’ve ever been insulting as a tester, but my advice is to err on the side of honesty. What an author really needs is information. If something doesn’t work for you, it may not work for others. The author wants to know that before release.

aschultz · June 3, 2022, 4:39am

Well, it was part of a much longer screed, and words like “Your game suffers” is sort of a red flag. More simply, stuff like “I cannot possibly see what possessed you to write this” could, well, be condensed. Let’s not be that guy!

But the evaluation sort of meant the game was something you’d see in the early Sierra days, where you traded items with NPCs until you got what you wanted.

“Dry goods game” basically means find item A, give item A to person B, get item C, give C to person D, and so forth. Which has its limitations, and I was willing to accept that. But I wasn’t so willing to accept hearing that I apparently had no clue of just how limited my game was!

I agree that honesty is a good thing. If I think a fix would be too risky to implement before release (whether due to the author’s technical skill or the complexity of the fix) or the author would turn off some people with certain content, I let them know. I always ask myself if I am being honest or being blunt. But I think it’s useful to use stock phrases like “this seems relatively weak” or “this seems relatively strong” or “this seems worth a roll of the dice to try and get an a-ha moment to try and fix.” I try to avoid full-on punditry mode when giving advice, because 1) general principles and 2) I wasn’t asked to be a pundit, and the author probably doesn’t want that sort of drama.

Nathan · June 3, 2022, 7:26am

I started beta testing after a lot of experience finding bugs in Infocom games. I’ve also found spectacular and interesting bugs in The Pawn, Eric the Unready, Not Just an Ordinary Ballerina, and other games no one can ever fix. So my focus has always been technical–I’m looking for the bugs. That’s a good kind of tester to have, as long as you have other styles in the mix as well. And here are a few of the things I do, although @DeusIrae already covered a lot of it.

Learn the “right” way to play the game. Going through and trying to figure it out like a “regular” player should generate a lot of helpful feedback.
When you’re comfortable playing through the game the way it’s intended to be, stop doing that! Try to break everything. Use objects the wrong way, Use special-purpose verbs on the wrong objects–I’ve had great success with verbs like EMPTY and THROW doing things they shouldn’t, when they’re not even important to the game. Leave things where they shouldn’t be. Go places you shouldn’t be. Try things that shouldn’t work. Act on knowledge the PC shouldn’t have yet. Try to violate every assumption the game might be making.
When you find a vulnerability, try to exploit it, to leverage it like some hacker trying to take over a system. If a bug is letting you get away with something you shouldn’t be able to do, use it to make the biggest cheat you can. Comments are good, but it can be more effective when your transcript shows the author how you hacked the game.

mathbrush · December 10, 2022, 4:42am

Background

I’m posting this as a resource for people who want to help test Inform 7 games but don’t really know what is helpful as feedback. My motivation for this is that several people have volunteered to test for a game I’m working on and others might in the future and it will be nice to have a resource to point to. Given that background, I’m going to write from my perspective, but it’d be interesting to hear the viewpoints of other authors.

Getting started
Testing an Inform 7 game generally involves getting a file such as a .gblorb and possibly a walkthrough or other supplementary material. The file requires an interpreter to play; Lectrote is probably the simplest option for most platforms.

Transcripts
Inform 7 allows people to record their games as they play. When you begin playing, you can type TRANSCRIPT; the game will ask you for the name of a text file, and then will start recording your moves into that file. It can be useful to open the file at some point, make sure it’s recording, and then close it. You can type TRANSCRIPT OFF when you’re done. Since the transcript won’t catch the opening text of the game, you can type RESTART immediately after starting the transcript to record the opening (if you feel it matters).

It’s common to annotate your transcript with some easily searchable symbol, with asterisks * being pretty common. The game will throw an error message (unless the author coded in a better response), but it’s fine. So you can make comments in the transcript like

>*I thought I’d be able to pick up the purple slug, but the game said ‘That’s hardly portable’.

Playing through
The more of the game you see, the better. Try to beat the whole game or as much of the game as is finished; if you get stuck, ask the author for hints or consult the walkthrough. Make notes in the transcript of where you got stuck or needed help.

What to focus on?
This depends on how polished the game already is. If you’re one of the first testers, there are probably tons of bugs everywhere, so just make notes everytime something dumb happens. (For instance, in one of my games, a tester put the entire contents of a train car on top of a hat).

If the game is nearly complete, try weird actions that no one would expect, to see if the game can handle them

For all stages of the game, it’s useful to share what you liked, what you didn’t like, and what you thought was missing.

This sounds like a lot!

Yeah, most people don’t do all of this every time. But any feedback you provide at all is useful! So even if you forgot a transcript and only got through the first room of the game, you can still say what you thought.

DeusIrae · December 10, 2022, 5:20am

This is a really helpful, nicely laid-out resource!

One wrinkle I’ve run afoul of here is that while this will work for glulxe games, if you’re playing a z code game (whose file extension will be .z5 or something like that), RESTART will actually terminate the transcript! So that’s one reason why your advice of checking to make sure the transcript is actually recording is very good.

We had a more general thread about how to be a good tester a couple months ago - it’s not Inform specific and it’s largely pitched at more experienced testers, but figured I’d link it in case it’s useful for some folks:

mathbrush · December 10, 2022, 5:28am

Hmmm, that thread is perfect! If any mods see this, can my thread be merged into that one?

Zed · December 10, 2022, 7:46am

The author could have the beta version start the transcript automatically with:

First when play begins: try switching the story transcript on.

This, with some other niceties like polite responses to input beginning ‘*’, and Use options to turn on beta testing or turn off auto-starting the transcript when beta testing is on are in Beta Test by Zed Lopez.

HanonO · December 10, 2022, 7:47am

As you wish…

Is this specific to Inform 7, or is this an interpreter function?

Jade · December 10, 2022, 9:30am

This is a very little stuff. Last month I released a spanish paper where I discuss about betatesting. This can also be listened in “Tormenta de plomo” podcast for about an hour.
The main restricting point is the time that a betatester has to spent testing a game. I try to test any game for about 10 times at least, but life happens.

Adventuron also creates transcript files. You have to type TSTART to begin and TSTOP TO FINISH transcript , then upload or email the transcript file to the author.

Jade.

rovarsson · December 10, 2022, 10:56am

For the author, it’s nice to have a variety of testers who have different styles and priorities.

I’m not a tech-wizard. I don’t know where to look for bugs. (I have found that they are disproportianally attracted to ropes though…) I can’t intuit how mechanisms work under the hood. I have no idea whether a certain strange response is a one-off or a symptom of a systemic fault.

I enjoy reading, playing and following a game/story, letting myself be swept along by the author’s drive. Not with my tester’s cap on though. Playing along with the author would ruin much of the fun I find in testing.

When testing, I play against the flow. Or I ignore the flow.

Play the game out of order.
If the game nudges you to go West, go SouthEast. There’s bound to be someone there the author didn’t want you to meet yet.
In a tense and high-paced scene, stop, stall, and SMELL the roses, LICK the chandelier, and FOLD the tablecloth. Great for teasing out timed bugs in a zen-like manner.
After finishing a task set by the game, retrace your steps and try to fool the game into retriggering events or scenes that now make no sense.
Stretch the parser.
Try a bunch of different words for your commands. Many different nouns will alert the author to the need for synonyms. A variety of verbs may make the author think of alternate puzzle-solutions.
Also, play with silly or inappropriate commands. It’s fun for you and the author may get a flash of creativity.
Be straightforward and demanding.
If something doesn’t work like you suspected it would, complain about it to the author. If you think your solution is better than the one in the game, nag about it. If you want to PICK a flower to give to the librarian and the game won’t let you, write a critical essay to voice your disappointment and send it to the author.
Do all this in a respectful, perhaps joking, but always clear and straightforward manner.
Check for typos.
Authors read their own text hundreds of times. Typos and language errors stop jumping out at them at some point. Point out the errors in the transcript. The author will be grateful.
Typos in a text-game are bugs. I make it a game to spot and terminate them.
Divide your comments.
Short remarks about typos, one-off small bugs (like a missing space between a game-response and the next prompt), or inexaminable scenery go right in the transcript.
Notifications about a game-breaking bug or longer big-picture remarks about characterization, pacing, tension-building, plot-holes, or unnecessary cheese-references go in PMs (preferably with a pointer in the transcript. A la “Love/hate the use of space here. I’ll tell you why in a PM.”)
Resurface once in a while.
Remember it’s a game, not a chore. Have fun with it. Stay playful. Chances are you’ll stumble upon a great opportunity to improve the game or a game-breaking bug no-one would have noticed if you weren’t bumbling around naked and blindfolded and crashing into things.

This describes how I approach testing. But as I said in my opening comment: authors should ideally have a variety of testers with different approaches to cover all bases.

Doug_Egan · December 10, 2022, 11:19am

Great advice. I hope someone will chime in about choice game testing. I’m working on a Twine game right now that might require more than the usual amount of testing due to some pseudo-parser like features, but I’m not even sure how one records a transcript in Twine.

rovarsson · December 10, 2022, 11:28am

I’ve never tested a choice game. I’ve sent remarks to authors during a Comp but never a systematic playthrough with my tester’s cap on.

I would love to hear what choice-game authors have to say about testing.

Transcripts (or something similar) are not possible in Twine. (I think?)
I imagine a combination of a listing of clicked links and screenshots would be possible ( but a lot of work for the tester).

Doug_Egan · December 10, 2022, 11:48am

I could imagine inserting beta test commands in a twine game, at the least a text field that would save to an array (maybe with some other gamestate data that was saved when the tester logged their comment) that I could then review when a play tester sent me their save file

For that matter, the save file may include a complete history of the players choices (revealed via ‘undo’). I’ll have to check

mathbrush · December 10, 2022, 1:03pm

It’s part of the core Inform Code (I included it in 77 verbs) but I’m sure you could write an interpreter that doesn’t support the code due to Inform’s virtual machine structure.

pinkunz · December 10, 2022, 1:05pm

If it’s a short enough twine game, I might be tempted to use the Twitch model. Simply make a recording of the playthrough with audible comments throughout. Obviously confirm that sort of format is accessible to the author first.

mathbrush · December 10, 2022, 1:08pm

I talked a couple months ago to a student of @Joey Jones who was developing a system for automatically analyzing a Twine game, including running some random playthroughs. I can’t find it online right now, but it might be available soon.

AmandaB · December 10, 2022, 2:03pm

Something to consider is that testing a game can be emotionally fraught for the author. I make a point of telling testers that I am not sensitive and that I can handle criticism, and that I actively seek criticism at all levels, even of the big picture stuff. My games have been better by orders of magnitude for this.

But games are personal, and contain people’s blood, sweat, and tears, and testers should consider actively asking how much criticism an author wants. If you find giant plot holes that need substantial rewriting, does the author want to know? If there is insensitive language toward women that may tick off the audience, will the author take this information well?

There’s a thread going here right now where the author wanted testers, but then has reacted very poorly and publicly to testers’ comments and advice. Anyone thinking of testing a game should read through that for a model of how testers should respond to a sulky author.