Merk's Scoring System (for this year)

For the last couple of years, I have wanted to change my scoring system to something a little more technical. Several times, I found games that didn’t really fit any definition clearly. Maybe the story was incredible but the game was buggy, or maybe the writing was great but the puzzles were impossible to solve. I probably could just scrap any sort of judging guidelines entirely, but when it takes four to six weeks to play every entry, I want to try to make sure the rating from game to game is fair and at a level of scrutiny that’s equal to all the others. If this doesn’t work, then next year I might just use my best guess and not worry so much with a scoring system.

But, here’s what I’m trying for this year. Feel free to borrow it, use it, change it, mock it, whatever. I might revise it in small ways before the end of the competition, but I think it’s fairly static now.

My 2008 IFComp Scoring System:

Free Point (F): (0=Unplayable, 1=Playable)
0 points if the game is not playable (disqualified before playing, obscure platform, crashes on start-up, etc). 1 point if it’s playable enough to be judged and voted.

Technical / Implementation (T): (0=Bad, 1=Fair/Good, 2=Great)
0 points if the game is buggy, poorly tested, broken, or implemented only at a basic level (just the minimum to support the core of the game). 1 point if the game works well enough to demonstrate technical competency, doesn’t seem to be just a “bare-bones” implementation, and works without exhibiting numerous problems or major frustrations. 2 points if the game excels on a technical level, where the game feels more alive and real because it’s responsive, well-coded, well-designed, and does a great job at handling most commands.

Puzzles / Interactivity §: (0=Bad, 1=Fair/Good, 2=Great)
0 points if the game lacks interactivity at even a good CYOA level, or if there is little to no point to this aspect of the game. 1 point if the puzzles are okay but could be better (maybe they’re average or below, too lacking in originality, poorly-clued or unfair, too simple, or in a game without puzzles, if the interaction is only moderately meaningful or interesting). 2 points if the puzzles are interesting, original, and fairly clued (or, in a puzzle-less game, for great and perhaps novel interactivity).

Story / Purpose (S): (0=Bad, 1=Fair/Good, 2=Great)
0 points if the game has no plot, a throw-away plot (“generic knight searches for the generic Wonder Sword and does generic stuff while battling generic RPG monsters” for instance), or no discernable purpose. 1 point if the plot is worthwhile and entertaining (but not necessarily all that it might be), or if the story isn’t the point, if the game’s “purpose” works as intended. 2 points if the story is engaging and imaginative, fresh, unexpected, and well-told (or in a game purposely lacking a story, if something highly worthwhile takes its place).

Writing: (W) (0=Bad, 1=Fair/Good, 2=Great)

0 points if the text is poorly composed, unintentionally choppy, or grammatically error-ridden in an accidental way. 1 point if the writing succeeds with little to no problem, but doesn’t inspire or seems to lack imagination. 2 points if the writing is immersive, expressive, vivid, entertaining, exciting, fresh – in short, really worth reading.

Reviewer Bonus (B): (0=No Bonus, 1=Bonus)
In prior years, this took the form of half- or full-point skews to the score or a plus/minus designation to the score. This year, a bonus point will be given for the same reasons. It may be given to a great effort from a first-time participant, or for “something” particularly enjoyable that doesn’t necessarily factor into the rest of the score. It may be given when “2” just isn’t high enough for some aspect of the previous four categories. It will probably also be given to my favorite of the competition, especially since it’s the only way a game can garner a “10” using this system.

Composite (SCORE): (1=Horrible … 6=Average … 10=Excellent)
The above points are added to create the composite, resulting in a score from 1 to 10. Games which can’t even be given the Free Point are not ranked or voted.

Example: F:1 + T:2 + P:1 + S:1 + W:2 + B:0 = SCORE: 7

Thanks - this is a good system. I’ll adopt it as well, if you don’t mind!

Sure. Feel free!

I’ll have to make a conscious effort not to be too stingy with the 2’s, though. Otherwise, good-but-not-great games will end up with a score of only 5 or 6. But hmm. Maybe that’s fair. Well, I have plenty more ahead of me before I know how well I’m able to use this.

Just two games in, and I’ve already had to revise the wording a little. One of the biggest changes is to the Technical/Implemantation score, which now considers a “bare-bones” implemnation (even if it’s not evidently buggy or broken) as zero points.

Just a note to anybody who’s using or going to use this system or one like it:

I’m six games in now, and I’m already struggling a little with my own scoring system. The problem is, for instance, I can look at a game and think “well, this deserves an 8” (just, as a gut reaction), but to justify an 8, it has to get 2’s in two of the primiary categories.

I’ve been reluctant to give many 2’s (so far) because that’s the “top” score for that category. I have given 2’s when I feel it’s close enough or in order to justify the higher score I think the game deserves, but on the whole it does seem like my scores are ending up a point or two lower than where they would have been with my ranking system from last year.

And the reason is probably obvious. When I was voting the game “as a whole,” I could give an 8 or even a 9 if the “whole game” fell a little short of expectactions. Now, if the “whole game” falls short of expectactions, I’m breaking that out into its component pieces, meaning it loses a point in each category, not just as a whole.

Rioshin, you might have the right idea in giving a larger range of scores in each category, and then averaging at the end. For instance, I’d probably struggle less if I rated each category 1 to 10, added them up, and divided by the number of categories. I didn’, because then I’d have to decide whether to round up or round down, but I’m starting to think that might be better.

Even as it stands, if I were to give a 1.5 in two categories, the score would end up being what it should be without having to give a 1 in one category and a 2 in the other to get the same result. I may just be over-thinking it, but I think I’d rather vote a game too high than too low.

Anyway, I’m leaving it as-is for now. But I did want to give a heads-up to anybody who might use this system that it does seem to force scores lower than you might have intended.

True. My first two years as a judge, I voted by gut instinct. The moment I switched to a more general scoring system, I found my scores falling, with the same reason: a game I would have debited a point or two lost those points multiple times, because I felt the writing, the story, the technical implementation, etc. weren’t quite up to what I expected. So the last few years of voting, I’ve trained myself (while simplifying my voting system as to not eat too much time to decide on the score) to give better scores - to expect a little less each year.

That’s one of the reasons I’ve been using scoring systems like this the last few years, with larger ranges of scores in each category. (You should have seen the one I used in 2006! Too bad I’ve managed to lose the word document it was in due to a hard-drive crash - but the total score of the categories was in the 120’s, with about 20 or so categories I was looking at. It was way too impossible to score a perfect 10.)

I’m actually thinking of going back to something more complex the next year, but with a twist: I score each game, decide the minimum and maximum for that year’s crop, and then put the games on a Bell-curve according to that…