A little rubric chat

This is a nice desire but I think would just start arguments, particularly since it’s very hard to find a set of criteria that will fit all the diverse types of games entered in the comp. (There’s a reason I didn’t share all my rubric categories and criteria above!)

In the end, as an author and reviewer both I think you just have to take the IFcomp score guidelines to heart and assume that there’s enough people voting to make that more accurate than any individual person’s contribution. As long as you have that in the back of your mind there’s not a wrong way to review or score.

10 Likes

Just as a contrast, I feel like I should bring up Adam Cadre’s approach to comp voting. Adam wanted his favorite game to win the comp. So he would give his favorite game a 10, and the majority of all other games a 1-5. IIRC, he would usually give out a handful of 6-7’s if he felt they were particularly good, even if not his favorite.

For myself, when I could play all the games, I would just sort them from favorite to least favorite, then evenly distribute numbers down the list, so in a year with 30 games, I’d give out about 3 of every number.

This year, I felt I had played enough games to basically do that, but I didn’t evenly distribute my numbers. I gave my favorite my only 10 (Ribald Bat Lady FTW!), because there weren’t any other games I played that I enjoyed as much, then went from there. I ended up dragging game titles around in Excel until I was happy with it. It ended up basically bell-curve like, and I didn’t give out any 1’s this time, in part because I hadn’t played all the games, and a 1 sort of seems iffy to give a game that definitely tried, for me.

The upshot was that for me, my ‘acceptably made, even if not meriting any special attention’ was a 2! All the games I played seemed acceptably made to me, even if some of them didn’t work for me as well as they could have. If I had set that level at 6, my votes would have mattered a lot less, and I like to have my opinions have at least some consequence.

6 Likes

But if you made a bell curve, presumably the middle wasn’t super far from 6! I mean, there’s just not that much room for it to be far elsewhere in a scale going from 1 to 10. And you didn’t use 1.

-Wade

1 Like
Here's my rubric

10: Holy damn, this game is amazing! Polished, interesting, thematic, and beautiful. I love it and it will live rent-free in my brain for a long time and I will force my husband to play it with me and talk everyone’s ear off about it. I had visible reactions when playing it –maybe I laughed, cried, shouted, or jumped out of my seat. I really hope this game places in the top 5!

9: This game is great, I had a fun or interesting time playing it! It was evocative and made me Feel Things, I found it engaging, and I can chew on it and its themes for a good while. I would recommend this game in the future beyond IFComp!

8: This game is really good! It’s polished, engaging, and entertaining in a way I found worthwhile throughout. I will think about this game and compare other games to it during IFComp. Despite my terrible skill with puzzles, I was able to complete any with only a few hints or nudges.

7: I can solidly say this game was good and a few moments even stood out to me as great. I would recommend people play it during IFComp so I can hear what they thought also.

6: A well-implemented game I would describe as “fine, I guess”. Not very ambitious, but acceptable. I wouldn’t actively recommend or not recommend it to anyone. I may have used a walkthrough because I’m bad at puzzles, not because they were bad in and of themselves.

5: Adequate, with a few moments that sparkled but not enough to impress me. Achieves what it’s trying to do or gets fairly close, but I don’t feel a desire to recommend it to anyone.

4: Mediocre. Maybe I didn’t find it actively unpleasant but I did find it unimpressive or boring. Misses the mark of what it’s trying to do, but I got something more out of it than a 3. I used a walkthrough because puzzles were poorly designed or flagged.

3: Unfun, unpleasant, or annoying to play. Severely misses the mark on what it’s trying to accomplish. For poorly designed puzzles, there was no walkthrough or hint system.

2: Severely frustrating in a way I found maddening. I didn’t have fun playing this game, maybe I didn’t even finish it, but it’s still playable, technically.

1: Unplayable. Maybe it was too buggy and I couldn’t get far in at all, it was straight up offensive in a way I couldn’t stomach, or it wasn’t a game.

I only played 8 games, but the rating range was between 3-9, skipping 6 oddly enough. Looking at my actual scores I should’ve changed some of them to fit the rubric better as my feelings changed over the comp…oh well, next time!

6 Likes

I suspect the encouragement to “use all the numbers” is to dissuade the kind of voting where everything you like gets a 10, or everything you don’t like gets a 1. I don’t literally try to use every number, because to me, a score around 5 is middle-of-the road, above 5 means more good than bad, and under 5 means more bad than good, and most of the games I’ve played in IFComp were more good than bad. I think it’s fair to make a score of 10 equivalent to the best game you’ve ever played, but if you do that, it might also make sense to make 1 equivalent to the worst game you’ve ever played (or tried and failed to play, because it was completely broken).

5 Likes

This is almost exactly how I do it. 7+ means I liked it and would recommend it, 1-4 means I would actively recommend not playing it, plus or minus a point for personal taste. Unused numbers don’t bother me. My lowest score was 5 this year, which is great! I am happy to reserve the lower numbers for hypothetical bad games that I didn’t come across this time.

I like the idea of rating games on the same scale every year, because scoring on a curve or using some other chaotic motivation makes the overall data harder to compare between years. Also, if you use all the numbers and don’t rank all the games, there may be games that you’d rank outside the scale if you’d only played them, and I don’t even want to think about that.

8 Likes

Personally—unlike many people, it seems—I’m not that hung up on achieving a normal distribution of scores, because there’s no guarantee the games themselves have a normal distribution of quality. One year there may be a huge influx of newbies making somewhat janky games, while another year might have an unusual number of polished games. Plus, I’m not one of those people who typically gets through every single game in the comp, so who’s to say the subset of games that I played is even a representative sample of the games that are in the comp?

I prefer not to share my exact rubric because it opens it up to nitpicking and questions of whether it’s really fair to this or that type of game, and I just don’t have the energy to get into it with people about that. My reviews are intended, in part, to provide helpful feedback to the author (whether or not they typically achieve that), and I don’t think that attaching a number, or the full scoring-rubric breakdown, would add anything that’s useful from a craft perspective. It might make it easier to predict my specific numerical vote on future games, but judges’ rubrics vary so widely, and there are enough of us voting even on the less-voted-on games, that I think there’s very little value in trying to predict or court the votes of any one person.

10 Likes

Before diving in in 2022, I gave a lot of thought to how I wanted to judge. Somewhat intimidated by both the outsider-barging-in vibe I was bringing and the sternly worded “use all numbers” guidance, I’ve been pretty open with my approach. What I DIDN’T (couldn’t, given my lack of community context) account for was how the scores would read and impact authors.

Since I reused it, you can probably tell I think I lit on a way that works for me right out of the gate. (Which is always a worrying statement, right? “Nailed it, first try! No need to internalize or adjust anything, I win!”) Certainly the fudgy ‘bonus/penalty point’ is crucial flexibility to accommodate singular works. I am hopeful that the words I throw out in support of the score compensate and explain, at least a bit, the scores on the lower end.

It is not lost on me that the whole judging thing is a fraught endeavor - real humans poured months, years of creativity, skill and passion into these things and after two hours (at most!) I reduce all that effort to a single digit number. On some level ANYTHING less than a 10 dishonors the effort. For me, if you’re going to presume do that at all, it makes sense to commit to it with real differentiation to give Big Numbers the best chance to buff down the burrs of idiosyncrasy. While being as open and supportive as you can.

8 Likes

So many good points, I have more thoughts!

My rubric kind of biased me against “decent/not decent” score mapping to start with, but this year really drove home why that didn’t resonate for me. If I can pick on one of my favorite entries, Kaboom!, it was simultaneously one of the more frustrating gameplay experiences and the MOST emotionally affecting game in the field. There was no dimension to the work that was merely ‘decent’! The rubric fed me a score and I really stewed over it. What ultimately made sense to me was comparing it against a theoretical version of itself. What would its score be if the gameplay sung? Does this score make sense from that view? To me, it did.

As I look back, there are quite a few games that I enjoyed much more than a vanilla ‘5=decent’ metric might indicate. To me, that decoupling can take the sting out of a lower score and is enough to let me sleep at night. If I do my words right, hopefully the authors can see that too!

5 Likes

Yeah, I feel it’s an inherent function of numerical scores that “perfectly decent, unremarkable” is going to score the same as “had aspects I loved and aspects I hated,” which is why I find it easier to use a rubric that breaks out points by category than one where each number represents a specific reaction to/overall assessment of the game.

4 Likes

I’m personally of the opinion that I would much rather play a ‘loved it/hated it’ game than an OK game, so I’m going to rate the loved it/hated it game much higher. But I also like that other people have different systems! The variety is what makes the overall scores more valuable.

7 Likes

I hate every rubric I’ve ever developed, which never spits a satisfactory number out. I hate rating games at all, largely because of this:

I have been really unhappy with 90% of the ratings I’ve ever given, which is why I can never bring myself to do them on IFDB; they’d nag at me for their wrongness and I’d probably keep changing them and messing with that game’s score. At least when you rate a game for a comp the window closes and then it’s too late to change it.

8 Likes

There was a good conversation about a year ago about “outlier scores”: Scores of 1and 2 on games that have a median of 7 or 8 and small standard deviation. Those outliers are understood to have a different meaning than, say, a 3 on a game that had a median of 5.

Back to Brian Rushton’s observation (and echoed by others) the games in this years competition were exceptional. I’m not sure you can compare score distributions from one year to the next, as many people likely do adjust their rubric to use the whole range.

3 Likes

This conversation is making me feel like cold-hearted robot! :joy: Albeit, a cold-hearted robot who gave Kaboom a “better-than-average” score. A game having a lot of heart, even if there are some technical flaws, definitely raises it in my estimation.

4 Likes