Scoring on a 1-10 scale (looking at Polygon)

cvaneseltine · April 28, 2014, 4:10pm

I just tripped over the Polygon explanation for how they rate games 1-10.

I like the way they handle this, and I thought it might be of interest to other people, since most of us wind up judging at one point or another (IFComp, Spring Thing, etc.)

maga · April 28, 2014, 4:16pm

Not a bad rating system - I’m not quite sure what I think off-hand about the ‘design, execution or basic functionality’ trifecta, but in general it comes pretty close to what I’d use.

I’ve been writing up my own comp-rating guide, largely because there are communities of practice in which anything below an 8 is an insult and 10s are routine, and I want to avoid people thinking that a 5 means ‘I hated your game’.

A near-impossible goal. The game is technically strong, highly engaging to play, and inspires at least a little bit of awe. It at least touches on important, difficult, human-condition subject-matter, and its forms of interaction closely involve player action with the narrative. Years from now it will still be a favourite. It’s very unlikely that more than one game in a comp will earn this score: in most years, none will.

8-9. A very good work, at a level of technical and artistic competence strong enough that criticism becomes a more delicate process, a lit-crit exercise rather than a listing of unambiguous failures of craft. Ambitious, in the sense that it’s tackling something difficult in either design or content. I would happily recommend this.

6-7. Pretty good, but either flawed in some major ways, or unambitious in its goals. I am glad I played this, but I would not recommend it without some caveats.

The most important question for me in the whole voting process is: did I get something out of this game? If I didn’t respond in some significant positive way - I didn’t learn anything, didn’t have fun, and wasn’t moved to any emotion more beautiful than annoyance - then the absolute maximum score I’ll assign to a game is 5, no matter how worthy it is in other respects. If a game does satisfy one of those conditions, then 5 is the absolute minimum score it can get, regardless of its other failings - a totally shambolic game that I had a fun time with is a 5.

This is at the Margin of Respectability: if I don’t like a game overall, but still think that it has some things going for it, a 4 is a likely score. Maybe it represents a decent technical effort, maybe there’s a cool idea that wasn’t fully realised, maybe it’s a noble failure that took on a really difficult goal. Perhaps it’s just a game for which I am very, very much not the intended audience.

A bad game with few redeeming qualities. I got little or nothing out of this, and would recommend other players to avoid it.

A very bad game; something has gone seriously wrong here. Either the author has put in very little effort, or they lack some of the basic skills required to make an decent game.

The score reserved for games which should not have been entered into the Comp. Either they’re troll entries, or they’re so broken as to be unplayable, or they demonstrate a fundamental misunderstanding of what a comp entry ought to aim at.

jmac · April 29, 2014, 7:38am

Hmm! I think I would like to adapt (with attribution) both your ten-to-one and Polygon’s as “Here are two examples of well-thought-out and defensible scoring rubrics” for a forthcoming “Guidelines for Judges” IFComp.org page, if that’s all right with you…

jacksonmead · April 29, 2014, 8:06am

I wonder if putting something like this on the official page will act as a deterrent to some potential judges. Even if you put it up with the wording you have here, some people might think they have to come up with something equally well-thought-out and defensible, and then just decide to not bother. Has there been some problem in the past with people being assumed to judge based on something less well-thought-out? Would putting this on the official page actually deter anyone who is out to just vote 1 for every non-parser game, for example? I assume the goal is to have as many judges as possible, and this could result in fewer judges.

-Kevin

cvaneseltine · April 29, 2014, 8:08am

Having examples of “here is a reasonable scoring system” at the IFComp page seems like a good idea to me.

Unexpected. I’ve always assumed a 5 from you meant “I hated your game”. Good to know.

aschultz · April 29, 2014, 10:25am

I agree it’s a possibility. I’d suggest something like “You don’t need any particular credentials to judge games. You don’t need to have a degree, and you won’t have to defend your score before a panel, or worry if your method is as good as the other judges’. How you judge and score things is private. Don’t worry about if your scores average to 5.5, fit a bell curve, or are uniformly distributed from 1 to 10. Don’t worry if you react badly/emotionally to one game, and don’t feel you have to judge all games. However, first-time judges or people looking to add a degree of rigor may find the following guidelines helpful:”

(and then allow the player to open up collapsible HTML)

maga · April 29, 2014, 11:03am

Basically, if I don’t really like a game - like, 7-8 territory - I’ll probably end up spending more time talking about its flaws than its strengths. Often this is because the things it succeeded at are straightforward enough that they don’t bear much discussion - things like a lack of bugs, competent implementation and inobtrusively smooth prose are all good things that improve game experience, but it’s hard to write more than a sentence about them in a review.

But there’s a lot of reaction space in between strong enthusiasm and hatred.

maga · April 29, 2014, 11:12am

I’d be fine with that, although I do think strongly-worded This System Is An Example And Not An Officially Endorsed Ratings System would be important.

Other ones I’m aware of: Jacqueline has some up at http://allthingsjacq.com/intficreview_methods.html :

allthingsjacq:

Entries I simply did not enjoy:
1: This entry was terrible, horrible, not very good at all, flat out bad. It should never have been entered in the comp and this person should probably be tracked down and made to pay for wasting our time.
2: This entry was substandard, either buggy as hell or full of grammar and spelling errors. It also should never have been entered into the comp, and the author should likewise be tracked down and made to pay.
3: This entry was solid, at least in terms of coding and grammar, but the writing was quite below par or the story was unclear or the goal was unknown, or something to that effect. Basically, what would make for an acceptable first attempt, but it still felt like a waste of my time and shouldn’t be in the competition.

Entries that were okay, but still weren’t my thing:
4: Now we’re in a sort of No Man’s Land. The writing was okay, the plot clear, but there were serious issues that kept me from enjoying myself. I wouldn’t go so far as to throttle them for wasting my time, however.
5: Still in No Man’s Land. Things were solid, but just average; the piece wasn’t quite good enough to make it into the category of “Games Jacqueline Enjoyed In This Comp,” i.e. the games that received a six or higher.

Entries I enjoyed:
6: Now we’re talking. A six isn’t a game I’m just wild over, but I didn’t not like it. It’s definitely comp-worthy and I’m glad it was here for me to play. Secretly, I wish for a day when everything in the comp is at least a six.
7: I like a seven quite a bit, but it’s not the sort of game I would necessarily ooh and ahh about to my friends.

Entries I enjoyed a great deal:
8: Anything I rate an eight or above I really and truly enjoyed. Eights are, for me, generally quirky and amusing, a worthwhile and entertaining expenditure of time. Well written, well implemented. Worth telling my friends about.
9: Nines and tens are pieces that really affect me. They’re pieces that quickly acquire and hold my attention, without distracting me excessively with disambiguation or mimesis-breaking glitches.
10: There’s a very subtle difference between a nine and a ten. A nine is excellent; a ten is pretty much perfect. Unlike some of my friends, I don’t go back at the end of the comp and give my most favorite game a ten; quite often my favorite(s) only receive a nine.

UnwashedMass · April 29, 2014, 12:42pm

For what it’s worth, in the poetry slam community, for whom scoring is a big deal (albeit handled very differently), they don’t require judges to be calibrated to a similar scale so long as they are internally consistent within their personal systems. (One night I saw an old friend entrusted with judging duties and found him giving extraordinarily low but, in my mind, fair scores. I asked him what was his secret, and he told me: “Every poem starts with a full score of 10, and then I deduct a portion of a point every time I hear the words ‘I’, ‘me’ or ‘my’.”)