ShuffleComp Postmortem Thread

Cross-post from Wordpress; I’ve got a few posts brewing about this, and it seemed they’d get more discussion here.

One of the big objectives of ShuffleComp was to try out experimental approaches to comp-running. Minicomps, being one-offs with less pressure to serve as community pillars, seem like good places to try things out. So did I learn anything whatsoever?


For a start, this was fun. As for other effects: I know that at least one first-time author felt that the rule added to how comfortable they felt releasing a game. It also allowed participating authors to promote the comp in general without automatically directing votes to their game in particular. On the other hand, it does mean a little less attention for individual authors while the comp is ongoing - which may be a more important consideration for pro or semi-pro authors for whom reputation-management means substantially deal more than idle vanity. On the whole, though, I think that this was very much a Good Thing, and I would suggest that future minicomp organisers at least consider it.

No gag on author discussion

These were very much only-in-low-stress-minicomp rules. A high proportion of reviewers were also authors, so there wouldn’t have been much reviewing without it.

The effects of the no-discussion rule was to some extent nerfed by the pseudonym and positive-review rules; the risk of trading good reviews, or of being affronted by a bad review from a fellow-author, or of authors making fools of themselves by grouchily defending their games in comment threads, were greatly reduced. I don’t think we can say much about the broad effects of this one until it gets tested in a more neutral environment. (My position is that the restriction on discussion remains a Very Good Idea for major comps.)

Personally, I elected not to review games at all, partly because of organiser neutrality but mostly because I was tired. And this was kind of crazymaking, honestly, because writing about stuff is a big component of how I think about stuff, and some of these games did thought-provoking things. Later. Later.

Reviews count as votes for games

The internet has different cultures for criticism and rating. The IF community has traditionally had a pretty tough critical culture: we expect that everybody who makes a game is dedicated to the rocky road of artistic growth, and feel that cotton-wool is a poor growth medium. (And also we have a certain number of people whose only joy is to grumble about shit.) This comes from a number of places - a dissatisfaction with the publisher-driven mainstream games media (which has often been little more than advertising), one or two cultural imports from academia, a certain amount of defensive pride about the high standards of our amateurism. If you come from a place where a score lower than 8/10 means ‘don’t play this’, or where most commentary is either unqualified praise or outright hatred, we can seem awfully mean.

And people respond differently to different approaches. Some people really need to have their first effort systematically torn into tiny shreds in order to do better next time; that is how they will best flourish. Some people do better with other approaches. (And not everyone even wants to do better next time. That is profoundly, deeply weird to me, but I dunno that it’s therefore invalid.) There is not any good way to tell who will actually respond best to which critical environment - I sure as hell don’t think that the authors themselves would reliably know - but I think having more than one option is, at least, hopeful.

At the same time, I believe very, very strongly in the responsibility of the reviewer to be honest and clear about their experience of a game. So the goal of the rule was to carve out a bit of space in which reviewers were encouraged to write reviews of games they did consider worthy, emphasize that they didn’t have to review every game, maybe encourage them to delay negative reviews until after the voting period, while not actually muzzling anyone. If you want to write reviews of a game you’re not keen on, you have a number of options - wait until after the voting period to post them, write reviews for every game and thus make your review votes moot, or just cancel out your own No vote. (Yes, I’m aware that submitting one Yes and one No vote would not have the same effect as submitting no votes at all.) Combine that with the fact that the precise vote a game gets doesn’t matter all that much, and it adds up to some pretty mild motivations. Which was the idea.

(To be clear, I really wouldn’t want this premise applied to the more serious affairs of IF Comp and Spring Thing, say; but I thought it’d be a good fit for lighter, lower-pressure minicomps.)

So how did this work out? Obviously it’s impossible to tell what the reviews would have looked like without this rule. Since a good chunk of the reviews were written by game authors, it seems plausible that they’d have tended towards a more convivial, Miss Congeniality-ish tone anyway.

That said, it was very clear that this rule - mild as it was - bothered more people than any of the other experiments. Some people pushed back against it by reviewing every game. Others told me that it felt weird to be adjusting their reviewing approach. That’s valuable, I think; it’s important to re-appraise stuff every now and then, and if it doesn’t feel weird then you’re not really re-appraising. There were still a number of sharp-toned reviews, or reviews that concluded No. Great! If this rule had resulted in an unbroken stream of sappy positivity, it’d have been a clear signal that it was too strong.

In general, it seemed to me that the rule - or, at least, the fact that there was a voting process - did result in more reviews than we might otherwise have seen. I’d strongly encourage future minicomp organisers to think about how to motivate reviews, and to regard voting as a key component of that.

No archive-unfriendly games

This rule was spurred by a couple of experiences. On the one hand, I’ve been writing a sequel to Joey Jones’ goofy meta-IF romp IFDB Spelunking, and in the process discovered a significant number of games that have vanished entirely - and not just little SpeedIF-level games or things from the distant past, either. On the other, I’ve talked with Emily Short about how Bee, my favourite CYOA ever, is reliant on the continued existence of the Varytale site: the text is safe, and the work could, in theory, be ported to another platform, but it’d essentially involve rebuilding the mechanics of Varytale as well as the game itself.

So partly there’s an issue about platform creators making game platforms that can’t (or aren’t meant to) survive in the wild, and partly there’s an issue about authors not archiving their work even when they could do so. This rule was obviously just about the first.

One problem is that archive-impossible and archive-awkward platforms do exist, and a lot of them are pretty cool in other respects, and authors are going to use 'em. (‘Robust archiving sensibilities’ is never going to be a killer feature.) So I think that fixing this by putting pressure on authors is probably not an ideal route: the platforms are there, and authors are looking for the platform best-suited to their work’s mechanical and aesthetic requirements, which have nothing to do with preservation. Is there a way to apply this pressure to platform creators instead? In the case of platforms which were designed primarily with commercial uses in mind, I kind of doubt it; if they’ve decided that archive-friendliness and/or playable-offline are in conflict with their ability to make money from games, there’s no real counterargument.

Another part of the issue is the nature of the IF Archive; it is built really as a vault, more concerned with preservation than access, while authors are more immediately interested in a distribution platform. My feeling is that in order to function well at either of these it is necessary to function well at both, but I don’t have (or want) the job of actually running a thing like that.

Authors can vote

I didn’t see the actual vote results, but I’d expect that a significant proportion of voters were also entrants. So at a practical level, this seems sort of a necessary feature in a voting minicomp, particularly if it gets relatively high participation.

Yes/No voting rather than a 10-point score; no ranking stats released

Again, by shutting myself out of access to the actual scoring totals, I’m unable to judge the effects of this too closely. On the whole, though, the top-ten results don’t seem very divergent from what you’d have expected with more graduated voting. That said, I think avoiding a precise ranking of games - particularly for games outside the Commended grade - contributed strongly to the moderate-pressure tone of the event. Introcomp, which partly inspired this approach, releases rankings for the top-placed games but not the lower-placed ones; that could also have worked, but I felt that given the wackiness of the reviews-as-votes rule and the relatively low voting in minicomps, it was more honest to honour top-placed games as a whole rather than encouraging a focus on hair-splitting stats.

I expected a lot more complaints about this, given how much the IF community loves its comp statistics, and given grumbles I’ve heard regarding the lack of transparency in Introcomp and XYZZY voting; but as it turned out, very few. My feeling is that people accepted this as part and parcel of the moderate-pressure pitch of the comp, and I’d encourage similar approaches in future minicomps.

This was basically why I insisted on hosting the games; if z-machine/glulx/webpage games are hosted (on the Archive or elsewhere) as unzipped files, then people can play them online, so it’s a bit of a distribution platform given the proper links. Of course my aversion to the “download this massive zip of all the games” led to my having to download several massive zips of all the games, but I WILL JUST MAKE THAT SACRIFICE FOR THE IF COMMUNITY. BECAUSE I LOVE YOU SO.

1 Like

I really like the yes/no instead of the 10-point score. Occasionally, major competitions see things like people giving favored-platform games a 10 and other games a 0. It feels like spite judging does a lot less damage under the yes/no model.

I found it sort of intuitively iffy, because I think that strong approval should carry more weight than mild. But honestly, a game’s placing in the IF Comp generally has more to do with the centre of its bell-curve than its most passionate adherents, so I don’t know that it’s hugely different.

Big picture: I think this was a success with a high overall quality of work. I know the last couple recommendation rejects were TOUGH for me, and I know some who didn’t write games got something cool done that’ll show up as something. I got ideas for stuff myself.

I think the randomizing testing worked well. I was glad to make a more formal acquaintance of people I knew by name, and I also enjoyed being able to help people I knew by testing. I also liked being able to do something constructive even though I had no time to write a game.

The pseudonyms were fun. I try not to be biased, but it’s hard. Anything that encourages objectivity is good, and in this case, I liked having one more bunker to say: would I say this to a very good friend? Would I want a stranger saying this about me–and would it actually help? It helped me not worry If I Liked This Writer or if I was overcompensating for Liking This Writer.

Because it’s impossible to be objective, and every nudge helps. Sort of like talking about Black and White when analyzing a chess game, not “me vs. them.” I realize time spent guessing who wrote what was wasted. So pseudonyms were more than a neat thought experiment or a chance to toss out silly names.

The mechanics of scoring is trickier. It felt relaxed to me, and it should.

A problem with this voting system is that it can penalize longer more ambitious works. Sequitur and Invisible Parties come to mind. Probably, fewer people review longer games. I know I can put them off.

I’m being a hypocrite here with the toughness of some of my games–I like writing tough games myself but I put these off. I suspect some people did not review these, opting to wait. So 10 people review an awesome short game, 5 review a long one–but someone universally grouchy/negative (or any negative vote where someone doesn’t get it–that happens with any game) can still spike the results since 10/1 > 5/1.

I imagine you’re pretty aware that doing so won’t change things except under extreme circumstances.

Game A: 1/4 yes/no
Game B: 0/1 yes/no

Yes/no puts them at 2/5 and 1/2 votes so there’s no skew unless one game is hit LOTS harder with votes than another. This is nontrivial enough that people may need some reassurance that yes/no is a better option than doing nothing. I think they do, largely.

In that vein I also think people don’t have to feel too bad about not getting to games they wind up liking. Emily Short missing Monkey and Bear comes to mind. Because moving from, say, 6/8 to 7/9 is not a big jump.

One solution would be to have a weight of likes and dislikes e.g. you can privately rate a game -2 to +2 (this would equate to 1-5 stars on IFDB, and it might allow the organizer to add IFDB ratings too) with a review adding a point, thus meaning that your opinion can matter more than just a review, but not too much–and it is in private, still.

I can’t think of a better solution–giving extra weight to people reviewing everything has its own pitfalls (they’ll rush the last few.)

One other idea is to include a review as unfavorable or favorable. (Or favorable/unfavorable/neutral etc.) Perhaps this could be submitted to the organizer in private, or someone could summarize them in a blog entry. But it does mean more work for the organizer.

It’s not a big jump, but with thirty-three games involved, I’d expect the gaps between games to be quite a lot closer than your example. I don’t think it’s impossible that a couple of Yes/No splits could take a game out of the top ten - since adding more split votes will always drag the game’s score towards 0.5, and I suspect that the cutoff point for Commendation would be somewhat above that.

That’s okay–if a game is flawed, then “no” votes are to be expected, and enough good stuff will help it overcome that.

My point was more, if someone misses a really good game and doesn’t vote on it, they aren’t necessarily damaging its chances, especially not the very best games. I mean, very few reviewers feel guilty about not getting around to giving a game a “no” vote, which causes games with >50% scores more than games with <50% scores, but there’s a natural fear of missing a good game or not giving it its due credit. I think it’s important to give reviewers that nudge to say, yeah, it’s okay to miss stuff.

The way I see it, if every game gets approximately the same number of votes, random people missing this random game or that won’t be too bad–it balances. My worry is the fear of not being fair may prevent people from voting at all.

Also I said 6/8 and meant +6 -2, and the example of A and B was not intended to state my expected scores–I should’ve gone with +3-1 and +1-0. This extreme example seems to prove the rule, though–roughly the same number of people really should vote on all games, so the scoring at best needs only moderate tweaking. (I assume we all have, or can find, a list randomizer to shuffle the games these days.)

I don’t feel guilty about not getting to all the games because I didn’t expect to, but I acknowledge I could have had a better strategy for picking that one-third. I reasoned rather half-@$$edly that people might be more likely to start at the top of the .zip and go down, so I started from the bottom of the list and went up. Unless people were playing the online versions, which might have been even more likely, I don’t even know. I probably should have made my husband write me a Python script to put them in a randomized order for me…

Although someone mentioned reviewing by opening them all and picking the openings they liked the best and playing in the order that the games caught their attention, more or less, which also strikes me as a fun way to do it. Tempted to try that sometime.

I like the random-list IFComp feature a lot.

That was probably me; that’s what I did, anyway. It’s a method that works particularly well with games released under pseudonyms: starting each one with zero information about the author (positive or negative) and equal hopes of finding a hidden gem.

I’ve gone here and think it’s a great resource:

That seems great in a smaller competition, because strong openings are important anyway, and with a smaller game, openings mean more. It’s a trickier proposition in a more serious comp as you may get fatigued in the end, but this isn’t a more serious comp. (Disclaimer: I sort of used this method when going through IFComp 2011-3 games. Actually, I sort of went by file size, too.)

This is an EXCELLENT STRATEGY and you should continue doing this in the future. When I was reviewing job applications last year I read them in reverse alphabetical order. Too long have we W’s suffered under the tyranny of the alphabet!

That thought crossed my mind as well when the games were released and mine was at the bottom [emote]:D[/emote]

yhlee wrote:
someone mentioned reviewing by opening them all and picking the openings they liked the best and playing in the order that the games caught their attention

I was afraid some players would do this, and my game began rather inauspiciously: A warning that it could not be saved on some interpreters and that it was long. If you can’t even get your game to save properly, what are the chances that the game doesn’t have significant bugs? And isn’t saving important for a long game? Fortunately, enough people tried it out anyway.

I’ve been revising it with 6L02, and so far saving the game has not been an issue.


Hear hear! (…VanEseltine said…)

I am a boring L familly/last-name in English, so I’m in the middle of the alphabet (alphabet order is of course different in Korean, although the way family names cluster so heavily makes things different there too), but the end-of-alphabet-people-get-screwed thing has been pointed out to me because of book-shelving/placement issues in bookstores.

I agree that good openings are a boon, although I haven’t played IFs enough to get a sense–is it generally true that a strong opening will at least make a better playing experience, as opposed to a mediocre or unstrong opening? I’m currently a submissions reader (more commonly known as “slush reader”) and I would say the bottom half or so of submissions can be rejected fairly early, without my having to read more than two pages, because if someone consistently cannot punctuate correctly in the first two pages, it is vanishingly unlikely that they’ll figure it out eighteen pages later. (It’s really the mediocre stories that take the most time, because they’re not obviously awful, and they’re not obviously great, but from reading already-published stuff, sometimes a story has a sleeper awesome ending, or whatever, so I have to read through.) The thing is, slush reading is about static fiction, and so it’s usually more-or-less linear. I would tend to give a game a lot more rope re: a slow opening (vs. “none of the objects in the first five rooms have been implemented” extreme cases), because a good puzzle or narrative payoff will probably need a certain amount of setup. All of this to say that this is one reason I would hesitate to apply the “shiny openings first” method of deciding which comp games to play in the case of limited time. I don’t think it’s wrong to use this method, just, I don’t want to penalize too much based on my experiences with static fiction when they might not apply due to difference in medium.

This method obviously can be abused–however, I’d also argue that if an opening doesn’t hit me right away, it may be -my- fault, and I’m willing to look at the story/game when I’m more enthused.

That’s very interesting about mediocre games. I’ve wound up, on first impression, happier with games that did nothing fancy than games which started great but didn’t quite end as I felt it could. To use numbers, a consistent 4 seems better than a 9 that tails to a 5. Because there’s introductory bias and last-memory bias.

I’ve always liked this TedTalk which touches on expectations & how they can ruin experiences: … _of_choice … sometimes an opening can lead me to expect odd things. Heck, even a title.

And sadly, sometimes I’m NOT up for something exciting and challenging (and even genuinely good,) so I pass. Since you do editing work and have stricter deadlines than a game judge, you can’t shuffle the order as much…however. I tend to try and give a game a couple of strikes, or frame my expectations before continuing with a tricky game. A lot of this is touch and go. But if I feel myself giving a bland “whatever” or guessing I don’t like the game as much as I should, or even overreacting, I take a break.

First impressions can be abused, too. Definitely, in real life…and accounting and not over-accounting for that (e.g. rejecting genuinely nice people instead of fakes) is tricky. But it’s tougher to fake in literature. You can interrupt the book/game right away.

I can peg a game in the mediocre-to-abysmal range pretty accurately from a first screen, most of the time. If I was playing purely for enjoyment, rather than because (I’m writing reviews|I feel a duty to be fair|I want an educated opinion about the game), I’d quit many, many games either at the first screen or after a couple of moves.

And this means that games with a strong opening make my play experience much, much more pleasant, because it’s a joy to be around writing that I can trust not to suck. An unambiguously kick-ass opening makes me relax: great, you know what you’re doing, now we can get to the good stuff.

This is mostly a matter of writing, note; slow-to-emerge game mechanics are kind of a separate issue.

The pseudonyms were fun, and did make it easier, when playing, to focus on the games themselves without a lot of preconceived expectations. It also made it easier to write reviews without worrying that you might be adjusting how you say something based on who the author was.

The reviews-counts-as-votes thing, as well as the goal mentioned somewhere of getting more reviews, period, encouraged me to write a few reviews when I otherwise wouldn’t have written any. It hadn’t occurred to me that people, especially authors, might want MORE reviews from unknown players, when reviews by a few well-known, reliable reviewers are usually available for competition games.

Also, the low-pressure aspect of the whole thing was nice. If I were to enter a comp, it would probably be one like this. I did wonder, though, how to adjust reviews and votes according to these expectations. My reviews skewed in the forgiving direction because of the stated low-expectations, low-pressure intent of the thing. Though I do think in general that it’s preferable to be civil and constructive in one’s criticism than otherwise. I wondered, too…what are the purpose of these reviews in general? Are they more for the benefit of the author, or for potential players?

The “yes” or “no” thing made voting a bit difficult for me. I wasn’t sure where to draw the line; there were some I would have liked to placed in some middle category.

All in all, it was fun.

Have you brought this save problem up in a programming thread? It’s not the kind of problem I would expect to magically get fixed by switching to 6L02.

(Also, says the game was developed in Ren’Py. Not sure how that wound up in there.)