Color the Truth

mathbrush · November 18, 2016, 5:16pm

Due to a change in jobs and priorities, I no longer participate actively in the IF community, but I’ve been asked to write a post-mortem of Color the Truth, which I’m happy to do.

For most of its development cycle, Color the Truth was made with one goal in mind: to win the interactive fiction competition. But it didn’t actually start out that way.

==Halloween Dance=

In Ectocomp 2015, I developed a game title Halloween Dance with a new conversation system. After extensive research, I had decided that conversation (a perennial weak spot for parsers) could be improved by making it more like the physical-object system (the brightest spot for parsers). Topics would be taken, stored, examined, etc. To avoid one classic problem of the ‘ask/tell’ system, I attached a ‘preview’ to each topic, so the player would know what they are saying.

The game attracted no attention at all as a tech demo, though a little bit as just a game. I realized that I needed to put a bigger, more successful game out there, probably in IFComp, to attract attention to the conversation system.

Around this time, Chandler Groover and Emily Short did in fact discuss the system, but Chandler said that the system would be more interesting if topics could interact with each other. This ended up shaping the main mechanic of LINKing.

==Rashomon==

I also had been brainstorming new game ideas, and one game idea I had was a Rashomon-like story, where there was a robbery at a country house, and you would play as the various characters involved in the robbery. Each story would influence the others, so that if one person saw a green scarf and another saw a yellow scarf, you could tell the witness about the contradiction, and they would change their story.

This matched up well with the Ectocomp idea, so I decided to combine the two. I had a year to make the game.

==The plan to win IFComp==

During production, I was playing all the old IFComp winners, and looking at trends and patterns. I started to realize that almost all IFComp games were either buggy, unfinished, short, or highly repetitive/sparse. Typically, only 2-5 games were complete, long, mostly-bug-free, and not extraordinarily frustrating. These were always the top games. Writing quality didn’t seem to matter, except for the order of the top 2 or 3 games.

I posted about this here on intfiction with my ideas. My thesis was:

Any game, regardless of the concept, can win IFComp if it is long enough (about 150 moves in a minimal walkthrough), bug-free, and not frustrating.

This could be accomplished, I thought, by taking any game, padding it out to 150 moves, and beta testing it to discover any frustrating parts.

==Production==

I took my Rashomon game, wrote a skeleton version, then padded it out to 150 moves. I had some fun insights while I made the game (like making two characters have mirror-version houses), but generally just wrote the first thing that came to mind. The conversation system was a beast to work with, because every topic had to have a response from each character and a summary. I could have let each character only respond to selected topics, with a generic ‘not interested’ response to other topics, and I think this would be better for future games. But I wanted this one to be polished.

I wrote a complete, tested game before sending it to anyone. I have attached the original, pre-beta version to this post. At this point, I had spent between 80 and 120 hours on the game.

==Beta testing==

My plan was to beta-test months before competition, and to test in 3 waves. The first wave would test major concepts, the second wave would test scenes and characters, and the third would polish off bugs and typos.

Ade McTavish carried the heaviest testing burden. His comments and those of other early testers told me that my writing was weak and my setting and characters completely not memorable. So writing quality, which I had ignored, turned out to be frustrating to readers.

So I had to teach myself better writing. I thought about my favorite mystery game setting, Ballyhoo’s old circus, and updated it. The circus was an aging institution in the 1980’s, having largely died out in the 50’s. The only entertainment institution that was analogous for my generation was/is radio, thus inspiring the new setting. I completely wrote the entire text of the game over a month.

I made the characters outrageous on Brendan Hennessy’s (indirect) advice. I read through all of the old IfWiki craft posts on room descriptions and NPC’s (Mike Berlyn’s room descriptions advice directly lead to the ‘Old School Style’ descriptions of my game). I read dialogue advice, telling me to allow people to talk over each other and to have silence. I read advice on having NPC’s do random actions, but not too random. And so on.

I had almost 20 beta testers. New testers began to be mad when they saw the credits, wondering why I needed so many, but they didn’t realize that every tester changed the game fundamentally. I began to write past IFComp winners to get their advice, because Sean Shore had helped so much with Ether. Marco Innocenti and Sean Shore gave great advice. Ryan Veeder was unavailable, and Lynnea Glasser volunteered but ran out of time. Many others helped, including Robin Johnson, this year’s winner.

I read all of the old SPAG interviews of Comp Winners; Aoteaora’s author mentioned ‘cheap points’ you could get by having a nice help system, exits listed, having ENTER with no text mapped to looking, non-standard responses. Because of continued comments on the weak story and characters, I studied all of the old XYZZY winners, in long public posts, to learn what made them tick.

==The Outcome==

My theories had only suggested I could make the Top 3 with my ‘write, enlarge, test’ strategy, but by the 3rd wave of testing, I felt that I could win. I had played Detectiveland and several other beta games, and I recognized Detectiveland as my biggest threat, but I felt that I could win through extreme implementation (i.e. dense descriptions, recognizing many commands, etc.).

In the end, I took 2nd. I felt very grateful for this placement, realizing how far I had come from 10th the year before, and recognizing (from a poll on Euphoria earlier this year) that 2nd place games had entered public consciousness about the same amount as 1st place games. I was still disappointed, though, because my experiment was in part to vindicate ‘the little man’. Much of the literary community and the IF community puts people in boxes based on their ability. So and So is a ‘good writer’, So and So is a ‘bad writer’, and neither will ever change. I wanted to prove that anyone could win, just based on hard work. Sure, I had interesting concepts to begin with, but IFComp is littered with dead games that had good concepts but bad execution. I wanted to prove that ‘write, enlarge, test’ could let anyone win, given enough time (for me, between 200 and 300 hours). So in the end, the results didn’t show that; the winning game still had that ineffable touch of genius TOGETHER with hours of labor.

I’m very pleased with the response the game has received. If I had known how long it would take, how many hours of work, I probably wouldn’t have done it. I hope that some others try out a similar conversational system; I’ve released my code, but it’s littered with some messy leftovers (I still have code to allow temporary responses like Yes/No that disappear after any action, and which are listed every turn until gone).

(Note: the attached version is the pre-beta version. The only way to exit Erin’s story is to Scream).

Also, as a final note, I tried to make the characters based on real-life individuals that I know. Most people didn’t realize that the detective is African American, or that Chuck Lee is Chinese. Cindy is Eastern European, while the Morales’ are hispanic. Danny is Greek. I wanted to have diverse characters without making diversity the focus. [Edit: for people in the future reading this, in the time since I posted this I’ve heard from BIPOC that this type of representation isn’t necessarily helpful.]
Color%20the%20Truth.gblorb (851 KB)

joshg · November 18, 2016, 6:04pm

Thanks for the post-mortem! What with your intensive reviews, surveying and analyzing past winners, a number of us were curious as to how your game all came together.

CMG · November 18, 2016, 6:11pm

Thanks for writing this postmortem! I find the whole process fascinating, especially how small details (like having an empty command redirect to LOOK) can tip the balance. The recipe won’t work for every game of course, but I consider Color the Truth a valuable experiment.

It’s also neat that this is yet another game whose roots trace back to EctoComp. I love that little EctoComp.

Lucea · November 18, 2016, 6:14pm

Thanks for the postmortem. I do, however, want to push back on something:

What you are really saying here is “any parser game.” The concept of “moves” obviously does not apply to choice-based work, and pure volume of words/links does not correspond to placement – look at SPY INTRIGUE, or 500 Apocalypses, both of which are quite large indeed, and both of which placed far lower than that would suggest.

mathbrush · November 18, 2016, 6:56pm

@joshg, @cmg I’m glad it was of interest!

Lucea, these are excellent points. My current conjecture is that great web-based games similar to Twine need about 200-250 ‘clicks’ to hit the sweet spot, with Choicescript somewhere lower. Going higher than 150 in parser or 250 in Twine isn’t necessarily good in my model. I predict that someone going through Cactus Blue Motel would need somewhere in the low 200’s of clicks to win it (I haven’t played any games since before the competition).

*Edit
I later played Cactus Blue Motel, and on the second playthrough, visiting everyone and using every conversation (but without backtracking), it took 206 clicks.

Spy Intrigue was a major thorn in my model’s side, and is why I developed the ‘not frustrating’ tenet. Many people were frustrated by the caps, and the length of the game with insufficient progression markers. Everyone who looked beyond those loved the game. I beta-tested ‘A Time of Tungsten’, and I really enjoyed it. But i could see one pitfall for the competition.I told the author:

[rant]Now here’s one really big thing: your game itself! It’s huge! This is not bad; however, you have to be careful. The large size combined with switching between multiple perspectives reminds me a lot of Spy Intrigue. Spy Intrigue was the largest Twine game of all time, and eventually went on to win a nomination for Best Game of the Year in the XYZZY’s. But it did bad in IFComp.

One reason Spy Intrigue did bad, and I think your game has the same difficulty, is that it’s so hard to set up player expectations for a big game like this. In a game like Birdland or Cape, the gameplay is split up into several days, with each day being pretty much like the rest in terms of how many nodes and what kind of nodes and what amount of text you can expect. It makes it easier for players to get a feel for how they’re progressing.

In your game and Spy Intrigue, it’s hard to know how what’s going to happen next.[/rant] The author decided to stick with their current model, and several reviewers just loved it, loved it, loved it. In the end, the game scored about the same as Spy Intrigue, and earned the Golden Banana.

500 Apocalypses also hits the ‘frustrating mark’, because one of the most frequent things people mention in any IFComp review is how much ‘interactivity’ it has. Many reviewers were frustrated that they had no effect on the story/game whatsoever.

Finally, games like Midnight, Swordfight and Slouching Towards Bedlam are much shorter than my model predicts, but both are specifically designed to encourage replay, so that a typical play session remains about the same length as the longer games.

All of this needs a huge grain of salt, though; first, my model was not completely successful; second, none of this matters for XYZZY’s. In fact, I predict that Color the Truth will not even reach the nomination stage for Best Game. XYZZY’s are much more focused on artistry, and Take and CMG’s games will certainly be nominated for several awards. Also, the model does not work for IFDB popularity or long-term placement in the IF Canon. These remain ineffable, at least to me.

DougOrleans · November 19, 2016, 1:45am

So, given that Toiletworld seemed to be specifically designed to break all these rules: Were you also Chet Rocketfrak?

mathbrush · November 19, 2016, 1:59am

Ha, no, although I wish I had that honor.

Hannes · November 19, 2016, 9:45am

Very keen observation, especially about the frustration part. I would add being completely non-offensive to the mix – though maybe that is indeed a sub-point of frustration.

Aquillion · November 22, 2016, 12:39am

I think weird / experimental games frustrate some people who feel that they don’t “get it” or something along those lines… and a few people seem to really hate them for cultural reasons.

This partially does penalize Javascript / Twine / non-parser games (looking at the results, it still seems like there’s one or two voters who go down the line automatically giving a 1 to everything that doesn’t use a parser), but that seems to have faded a bit over time as people have come to accept them. For a game like SPY INTRIGUE, though, there’s always going to be some people who come away feeling cold… it’s simply not a game that strikes me as having been written with “appeal to everyone” in mind, especially given the visceral and personal nature of some of the post-death scenes.

For the better or worse, the top spots in a competition with public voting like this are, most of the time, going to go to relatively inoffensive games.