A little rubric chat

Looking at my ifdb rankings (which are just half my ifcomp rankings, rounded up), I gave a 1 star (i.e. a 1 or a 2) to 4 ifcomp games last year, but not any this year. I did give out at least one 3 that I can remember this year, to a game that felt unfinished and buggy and short (by an author who I don’t think is on the forum).

5 Likes

I’ve always treated five as “fine.” Average. Edit: and by average, I guess I mean competent in terms of both writing and tech. Sevens and eights are good games. Nines and tens are exceptional. There might be just a handful per year.

Because of my affection for punk rock and low-fi music, a well-written but hopelessly buggy game might not get a one or a two. RTE is about an unplayable IF Comp game from the 1990s that took second-to-last place, after all.

There are very few 1, 2, 9, or 10 games for me. A game I have a great time with is an 8. That’s what I consider a great game: 8. A 9 or 10 has to set itself apart somehow as special. It has to be more than a great game. Its themes or writing may really stick with me, for instance, or it may be innovative in some way.

I feel the same way about 1 and 2 scores. They’re more than bad. There has to be something truly obnoxious or unpleasant about them.

In practice, though, I only rate games that I consider sevens or better. I don’t finish anything else. There are just too many incredible games out there for me to settle for “fine.” We have decades of IF already made, plus everything coming out now. I’ll never experience every great IF game at my current rate, even if that’s all I play. I haven’t played every IFDB top 100 game, for instance, or every “best of all time” game, either. I have a lot of IF I want to get to.

I don’t like the feeling of handing out low ratings, anyway.

5 Likes

Well, one of the main reasons I made this post is because I came away dissatisfied with my rubric! I went into the comp with too much of a fixation on considering 9 and 10 as standards to be matched up against games of all time. Anyway, I feel pretty sure that in the future I may be more lax about what a 9 or 10 means within a given comp.

I wasn’t giving 6s or 7s as an “average” game. I’m just saying that if most comp games are going to at least check in as average, and if you reserve 10 for masterpieces that may only come every few comps (as I did), the range of scores you end up giving only varies 3 or four places. But in my mind I could rank that whole gamut of games in the 5-8 range much more specifically if more gradations were available.

3 Likes

For me, everything pretty damn good or clever is a 9 or 10. 8 is pretty good.

1 Like

As an author, these sorts of threads are useful to dial in on the audience. Not so much in the numbers but the requirements.

Maybe better – although risky and demanding of people who are already donating a huge amount of their time to the community – would be for judges drop authors a note on why they rated your game. Of course authors would have to, in turn, not argue with it, but a little bit of info could go a long way.

In any case, sharing the rubrics can also help the community come to understand what might be “fair play”. One person may give 1s for games that do not start. Others for games that crash at all. Or have content they disagree with. Or if it happened to be the lowest of the five games they rated. (And similarly for 10s!)

I have mathematical reservations about “use all the numbers” but something like IF Comp is in a weird spot in terms of statistics. Outlying ratings can still have a significant impact on averages.

In the end, I assume these are community events and assume best intent. Ratings are a necessary evil to make it a competition, but the real value are the reviews, comments, discussions and follow-on works.

7 Likes

I’ve read the whole thread and have no new perspectives to add. I can just add the data points of myself.

I score mostly the same way I also score movies, where I use 6 the most. 6 being the basic positive but not becoming too remarkable movie experience. However 6 also ends up being my ‘Bizarre film with flashes of brilliance but also too many flaws to go higher…’ and variants. No wonder I overuse 6.

Here’s my ratings distribution on IMDB, where I’ve rated almost all (as far as I can tell, now) of the 5000 feature films I’ve seen:

six

You can see my most used ratings in order are 6,7,5,8,4.

This is a ratings approach leaning on worth, not trying to use all the numbers more often. But after I’ve used it, in IFComp, I will push scores up and down a little bit to try to express my own preferences more accurately, only because I’m conscious everything in this batch will be ranked versus everything else in this batch. So I’m doing that thing where if I gave two things a six, and I liked one much better in retrospect, it may go to 7, etc. I’m separating things from each other for my own satisfaction.

-Wade

12 Likes

This is a nice desire but I think would just start arguments, particularly since it’s very hard to find a set of criteria that will fit all the diverse types of games entered in the comp. (There’s a reason I didn’t share all my rubric categories and criteria above!)

In the end, as an author and reviewer both I think you just have to take the IFcomp score guidelines to heart and assume that there’s enough people voting to make that more accurate than any individual person’s contribution. As long as you have that in the back of your mind there’s not a wrong way to review or score.

10 Likes

Just as a contrast, I feel like I should bring up Adam Cadre’s approach to comp voting. Adam wanted his favorite game to win the comp. So he would give his favorite game a 10, and the majority of all other games a 1-5. IIRC, he would usually give out a handful of 6-7’s if he felt they were particularly good, even if not his favorite.

For myself, when I could play all the games, I would just sort them from favorite to least favorite, then evenly distribute numbers down the list, so in a year with 30 games, I’d give out about 3 of every number.

This year, I felt I had played enough games to basically do that, but I didn’t evenly distribute my numbers. I gave my favorite my only 10 (Ribald Bat Lady FTW!), because there weren’t any other games I played that I enjoyed as much, then went from there. I ended up dragging game titles around in Excel until I was happy with it. It ended up basically bell-curve like, and I didn’t give out any 1’s this time, in part because I hadn’t played all the games, and a 1 sort of seems iffy to give a game that definitely tried, for me.

The upshot was that for me, my ‘acceptably made, even if not meriting any special attention’ was a 2! All the games I played seemed acceptably made to me, even if some of them didn’t work for me as well as they could have. If I had set that level at 6, my votes would have mattered a lot less, and I like to have my opinions have at least some consequence.

6 Likes

But if you made a bell curve, presumably the middle wasn’t super far from 6! I mean, there’s just not that much room for it to be far elsewhere in a scale going from 1 to 10. And you didn’t use 1.

-Wade

1 Like
Here's my rubric

10: Holy damn, this game is amazing! Polished, interesting, thematic, and beautiful. I love it and it will live rent-free in my brain for a long time and I will force my husband to play it with me and talk everyone’s ear off about it. I had visible reactions when playing it –maybe I laughed, cried, shouted, or jumped out of my seat. I really hope this game places in the top 5!

9: This game is great, I had a fun or interesting time playing it! It was evocative and made me Feel Things, I found it engaging, and I can chew on it and its themes for a good while. I would recommend this game in the future beyond IFComp!

8: This game is really good! It’s polished, engaging, and entertaining in a way I found worthwhile throughout. I will think about this game and compare other games to it during IFComp. Despite my terrible skill with puzzles, I was able to complete any with only a few hints or nudges.

7: I can solidly say this game was good and a few moments even stood out to me as great. I would recommend people play it during IFComp so I can hear what they thought also.

6: A well-implemented game I would describe as “fine, I guess”. Not very ambitious, but acceptable. I wouldn’t actively recommend or not recommend it to anyone. I may have used a walkthrough because I’m bad at puzzles, not because they were bad in and of themselves.

5: Adequate, with a few moments that sparkled but not enough to impress me. Achieves what it’s trying to do or gets fairly close, but I don’t feel a desire to recommend it to anyone.

4: Mediocre. Maybe I didn’t find it actively unpleasant but I did find it unimpressive or boring. Misses the mark of what it’s trying to do, but I got something more out of it than a 3. I used a walkthrough because puzzles were poorly designed or flagged.

3: Unfun, unpleasant, or annoying to play. Severely misses the mark on what it’s trying to accomplish. For poorly designed puzzles, there was no walkthrough or hint system.

2: Severely frustrating in a way I found maddening. I didn’t have fun playing this game, maybe I didn’t even finish it, but it’s still playable, technically.

1: Unplayable. Maybe it was too buggy and I couldn’t get far in at all, it was straight up offensive in a way I couldn’t stomach, or it wasn’t a game.

I only played 8 games, but the rating range was between 3-9, skipping 6 oddly enough. Looking at my actual scores I should’ve changed some of them to fit the rubric better as my feelings changed over the comp…oh well, next time!

6 Likes

I suspect the encouragement to “use all the numbers” is to dissuade the kind of voting where everything you like gets a 10, or everything you don’t like gets a 1. I don’t literally try to use every number, because to me, a score around 5 is middle-of-the road, above 5 means more good than bad, and under 5 means more bad than good, and most of the games I’ve played in IFComp were more good than bad. I think it’s fair to make a score of 10 equivalent to the best game you’ve ever played, but if you do that, it might also make sense to make 1 equivalent to the worst game you’ve ever played (or tried and failed to play, because it was completely broken).

5 Likes

This is almost exactly how I do it. 7+ means I liked it and would recommend it, 1-4 means I would actively recommend not playing it, plus or minus a point for personal taste. Unused numbers don’t bother me. My lowest score was 5 this year, which is great! I am happy to reserve the lower numbers for hypothetical bad games that I didn’t come across this time.

I like the idea of rating games on the same scale every year, because scoring on a curve or using some other chaotic motivation makes the overall data harder to compare between years. Also, if you use all the numbers and don’t rank all the games, there may be games that you’d rank outside the scale if you’d only played them, and I don’t even want to think about that.

8 Likes

Personally—unlike many people, it seems—I’m not that hung up on achieving a normal distribution of scores, because there’s no guarantee the games themselves have a normal distribution of quality. One year there may be a huge influx of newbies making somewhat janky games, while another year might have an unusual number of polished games. Plus, I’m not one of those people who typically gets through every single game in the comp, so who’s to say the subset of games that I played is even a representative sample of the games that are in the comp?

I prefer not to share my exact rubric because it opens it up to nitpicking and questions of whether it’s really fair to this or that type of game, and I just don’t have the energy to get into it with people about that. My reviews are intended, in part, to provide helpful feedback to the author (whether or not they typically achieve that), and I don’t think that attaching a number, or the full scoring-rubric breakdown, would add anything that’s useful from a craft perspective. It might make it easier to predict my specific numerical vote on future games, but judges’ rubrics vary so widely, and there are enough of us voting even on the less-voted-on games, that I think there’s very little value in trying to predict or court the votes of any one person.

10 Likes

Before diving in in 2022, I gave a lot of thought to how I wanted to judge. Somewhat intimidated by both the outsider-barging-in vibe I was bringing and the sternly worded “use all numbers” guidance, I’ve been pretty open with my approach. What I DIDN’T (couldn’t, given my lack of community context) account for was how the scores would read and impact authors.

Since I reused it, you can probably tell I think I lit on a way that works for me right out of the gate. (Which is always a worrying statement, right? “Nailed it, first try! No need to internalize or adjust anything, I win!”) Certainly the fudgy ‘bonus/penalty point’ is crucial flexibility to accommodate singular works. I am hopeful that the words I throw out in support of the score compensate and explain, at least a bit, the scores on the lower end.

It is not lost on me that the whole judging thing is a fraught endeavor - real humans poured months, years of creativity, skill and passion into these things and after two hours (at most!) I reduce all that effort to a single digit number. On some level ANYTHING less than a 10 dishonors the effort. For me, if you’re going to presume do that at all, it makes sense to commit to it with real differentiation to give Big Numbers the best chance to buff down the burrs of idiosyncrasy. While being as open and supportive as you can.

8 Likes

So many good points, I have more thoughts!

My rubric kind of biased me against “decent/not decent” score mapping to start with, but this year really drove home why that didn’t resonate for me. If I can pick on one of my favorite entries, Kaboom!, it was simultaneously one of the more frustrating gameplay experiences and the MOST emotionally affecting game in the field. There was no dimension to the work that was merely ‘decent’! The rubric fed me a score and I really stewed over it. What ultimately made sense to me was comparing it against a theoretical version of itself. What would its score be if the gameplay sung? Does this score make sense from that view? To me, it did.

As I look back, there are quite a few games that I enjoyed much more than a vanilla ‘5=decent’ metric might indicate. To me, that decoupling can take the sting out of a lower score and is enough to let me sleep at night. If I do my words right, hopefully the authors can see that too!

5 Likes

Yeah, I feel it’s an inherent function of numerical scores that “perfectly decent, unremarkable” is going to score the same as “had aspects I loved and aspects I hated,” which is why I find it easier to use a rubric that breaks out points by category than one where each number represents a specific reaction to/overall assessment of the game.

4 Likes

I’m personally of the opinion that I would much rather play a ‘loved it/hated it’ game than an OK game, so I’m going to rate the loved it/hated it game much higher. But I also like that other people have different systems! The variety is what makes the overall scores more valuable.

7 Likes

I hate every rubric I’ve ever developed, which never spits a satisfactory number out. I hate rating games at all, largely because of this:

I have been really unhappy with 90% of the ratings I’ve ever given, which is why I can never bring myself to do them on IFDB; they’d nag at me for their wrongness and I’d probably keep changing them and messing with that game’s score. At least when you rate a game for a comp the window closes and then it’s too late to change it.

8 Likes

There was a good conversation about a year ago about “outlier scores”: Scores of 1and 2 on games that have a median of 7 or 8 and small standard deviation. Those outliers are understood to have a different meaning than, say, a 3 on a game that had a median of 5.

Back to Brian Rushton’s observation (and echoed by others) the games in this years competition were exceptional. I’m not sure you can compare score distributions from one year to the next, as many people likely do adjust their rubric to use the whole range.

3 Likes

This conversation is making me feel like cold-hearted robot! :joy: Albeit, a cold-hearted robot who gave Kaboom a “better-than-average” score. A game having a lot of heart, even if there are some technical flaws, definitely raises it in my estimation.

4 Likes