Ratings on the IFDB

A few talking points about ratings on the IFDB

  1. The IFDB is pretty neat for a whole load of reasons. Chalk me up as a fan. But the question as to what rationale to employ when rating games troubles me.* At first, I thought of converting it in my head to a 100 point scale- with ‘3’ being between 60-79%, so pretty good. But given that there is no zeroth rating, ‘3’ is the middle star, and looks very much like 50%. There’s a big difference between saying ‘this game almost deserves 80%’ to saying ‘this game is around 50%’.

  2. By my count there are 1670 unrated games out of 4064, 41% of the games. As a community tool, there’s no use complaining that somebody should do something about this, but I’m wondering whether people think this is an issue, and whether there’ll always be so many unrated games or whether people think that over time it will tend towards greater completion.

  3. The ratings are obviously ratings about the whole game experience, but there are lots of different possible games. Do you rate a game based on how good it is as a game, or as a game of its type or even how good it was at conveying the experience the author intended to afford you?

  4. A lot of the time I don’t really want to give a rating, but I feel like I should because it helps improve the database as a ratings engine. A review is almost always more useful, but necessarily more time consuming to write or read. Perhaps I should tag more games- as this aids the IFDB qua database without appending a possibly misleading number onto a game.

*Well, ‘troubles’ is probably too strong. I’m certainly not lying awake at night over it.

Games with no ratings look sad and unloved. Even one star is better than none. Just go with your gut and pick a star.

For me, the stars are the single most useful metric for filtering games. I view it as a scale ranging from “Avoid” to “Strongly Recommend” and don’t get too caught up in the specifics.

I tend to avoid reviews for games rated 4 & 5 stars, since I will likely play those games and form my own opinions. For games rated 1 & 2 stars, the reviews can offer a lot of valuable game design advice, and are worth reading even when the game may not be worth my time. 3 star games are a toss-up; if the blurb is compelling, they go on the wishlist; otherwise I spoil them with reviews until / unless I am persuaded to take a closer look.

So yes, reviews are nice, but ratings are also nice. With so many games to choose from, it’s very helpful to have a metric that tells me at a glance how to approach a given title.

If you’d filter out all the obscure proprietary 80s and 90s games that often don’t even have a download link, I think that percentage will drop considerably. (That’s just a hunch based on my own observations, though.)

There are 3657 downloadable games and 1484 of them have no ratings; that’s about 41%. If you limit the search to games published in 2000 or later the numbers are 1950 and 543 which makes 28%.

A quick summation of a longer reply I was making! Fast figures Juhana! So a little under a third of all games are potentially worth playing and are playable but are unrated.

Yeah, maybe that’s the best approach. I feel that I’m somewhat susceptible to ratings-creep. My earlier ratings are harsh, but now I’ve ended up giving out four stars all over the place, when on reflection I often liked the new four star games less than the old three star games!

Case in point: I’ve given four stars to some of the new TMBG one-move Fingertips games when I’ve thought they’re pretty awesome for what they’re doing, but I only gave three stars to Patanoir (mostly due to polish) when I’d take Patanoir over any of the Fingertips games any given day.

I think this is one of the things that IFDB gets right, or at least more right than the IF Comp style of scoring games relative to the set, such that you end up with ranked winners and losers.

It’s perfectly fine in my book to give four or five stars to an outstanding one-move or one-room game, and three stars to a deeper, more satisfying game that nevertheless suffers from implementation issues. All you’re really judging is whether time with this game will be time well spent; the longer the game, the harder it is to sustain the ratio of awesomeness to minutes played.

For me, a game doesn’t really “exist” yet until it has at least, say, 9 or 10 ratings on the IFDB. :slight_smile: Or at least, I don’t imagine I can guess at any kind of consensus for how good it might be until that point.

Sadly, this means that there are even chunks of the Infocom catalogue that “don’t exist” yet. On the other hand, I guess I have them to look forward to :slight_smile:

Amen to all of that. I rate a game on how well I feel it succeeded as itself, and how much it rocked my world for the time I spent with it.

For me the stars are pretty simple:
1: Bad or Dull Game*
2: Yeahfine/Decent/Okay Game
3: Good Game
4: Very Good Game
5: Great Game

  • If zero stars were possible, zero would be reserved for dull, which is a shade worse than bad in my book.

This means there’s only one shade of “I don’t recommend this,” but that’s okay, because how many shades of “I don’t recommend this” do I really need?

When I don’t rate games that I’ve played, it can be for a number of reasons, which I don’t always apply consistently:

  • I played the game a long long time ago, and I can’t remember it all that well, or if I can I don’t really think that 2012-me would wholly agree with a self-important 17-year-old.
  • The score’s associated with a review that’s very equivocal or has a lot of reservations. (For instance, one’s appreciation of Encyclopedia Fuckme and the Case of the Vanishing Entree is going to depend so heavily on whether you find it sexy, revolting, triggery or none of the above, that rating it independently of that very personal reaction is going to be somewhat pointless.)
  • I’m trying not to be a dick to a first-time author who seems to have made a good-faith effort but still produced something wretchedly bad.
  • A five-star rating system feels too crude to summarise my feelings about the game, and any value I pick seems sort of arbitrary.

Not all of these are necessarily good reasons all of the time, but they have an effect. (And now you’ve guilted me into going through the games by year and seeing what I’ve played but not rated.)

We were actually just talking about this on the ADRIFT Forum. By the way… might wanna go through the list of unrated games and find which ones are written in ADRIFT. It’s possible they have reviews posted elsewhere, be it on the ADRIFT Forum or Adventures page.

I tend to view the star system as a very subjective thing nowadays, keeping in mind that my opinion is likely to be just one of many. That way, I’m not too worried about my reviews being too extreme… if they are, someone will balance them out later, probably (unless it’s an ADRIFT game). So my barrier for a three star rating is just my own personal enjoyment, like this:

1 star = I just don’t like it. Period.
2 stars = I could like the game, perhaps with minor changes.
3 stars = I liked it, but might feel hesitant about recommending it.
4 stars = I liked it and recommend it.
5 stars = A game I think everyone should play, whether they end up liking it or not.

Personal enjoyment is an easy metric to use, and removes a lot of the anxiety I had previously about rating games.

It’s fascinating to me that anyone could view it any other way. :confused:

My emphasis here is on the “very.” That is to say, I used to view them as subjective, but still hold 5 stars up as a sort of “world record” rather than a personal “I really, really liked this game.” Just three short years ago, I wrote on my blog: “A game would have to be nearly life-changing to score a 10. 10s I would only assign retrospectively to great classics.” In practice, that meant I unintentionally low-balled every entry in the ADRIFT IntroComp.

Mr. Whyld also expressed a similar stance on ratings recently, saying:

We also had people uncomfortable giving out low scores, for which Abbi Park developed a standardized disclaimer:

Can’t say much of this resulted in more reviews, though. Anyway, I don’t know how instructive this info from the ADRIFT community is to IFDB, but I figure some of it ought to translate.

Hmm… 20 games written… 1 game “exists.” :confused:

Well, in that case I don’t find it puzzling at all.

I can think of one work of IF that literally changed my life (for the better, vastly) and I rate it three stars on the IFDB … because it’s good, you know? But only just. For me.

Heh. I see five stars as an exalted category, not a specific position. Besides, even if someone were to choose, for their own purposes, to reserve the five-star rating to mean their unique choice of “bestest ever,” the IFDB allows us to change our ratings all we like … so, anyone could choose to grant only a single five-star rating for his very-very-very favorite … and then still leave room for that favorite to be unseated, as it were, by a later game.

Alas :slight_smile:

Yeah, this is pretty tricky. I tend to reserve 1 and 2 stars for things I think are not worth play or severely flawed in their present state, but the 3-5 range is vastly complicated.

The games I admire most are the ones that are both well-executed and trying to do something I think is worthwhile (where “worthwhile” might mean telling an awesome story, having something valuable to communicate about the human condition, breaking new ground in what the medium can do, presenting a really cool puzzle mechanic, et al). For very different reasons, I would consider “maybe make some change,” “Make It Good,” “The Baron,” “Patanoir,” and “Apocolocyntosis” all to be worthwhile in this sense, though I don’t think them all equally well crafted. And they’re pretty hard to compare to one another meaningfully at all, really.

While in theory I would like to have a system that graded craft and worthwhile-ness distinctly, rolling both into a single number is really hard. So sometimes I find myself giving 5s to games with gaping flaws because they’ve nonetheless got so much going for them, and sometimes I give a 4 to something that I thought was really solidly constructed but just not that significant; and sometimes it goes the other way and I penalize something conceptually awesome for its lack of craft followthrough. It’s really hard to be rigorous about this, so I just have to hope it will all come out in the wash.

To complicate matters, I find in practice that I’m sometimes harsher on really good games than on fairly good ones. The only way I can explain this effect is to say that once a game passes a certain quality level, I am mentally rating it against all other awesome IF, rather than average-quality-this-year.

So, for instance, I gave a 4 to Blue Lacuna, which on one level is absurdly unfair because it is bigger, braver, more polished and generally more amazing than the vast majority of IF. On the other hand, there are things about the writing that I wish had seen a stronger edit; I think a lot of the potential of Progue is often unclear to the player on a single playthrough, and the game is long enough to discourage numerous playthroughs; there were all kinds of pacing problems in the midgame; and the piece as a whole had massive stretch marks from having grown so much during production. It could have been better at being what it was trying to be (though I’m not sure that could have happened without killing Aaron – he’d put enough time in on the thing already, for sure). I totally understand how that can happen to a big project – it’s happened to some of mine – so I’m all sympathy, but still. There’s a lack of consistent vision, a lack of unity, that makes it a 4 when it’s lined up with other great IF.

Finally there’s the “spark of life” quality, which is not the same thing as being worthwhile, and is very very hard indeed to describe. But in essence, I find that there are some games that grab me with their creative spark, with the sense that the author was having fun or speaking from a place of true feeling. Sometimes goofy, pointless, badly made speed-IF has the spark of life; sometimes epic games three years in the making don’t. If it’s not there, I find the game a tiresome chore to play, no matter how carefully crafted it might be. But I also feel like it’s not that useful to write a review saying so, because the complaint is so nebulous and also feels like I’m attacking the author directly as a creative person. But no spark, no 5, that’s for sure.

The discussion about the relationship between numerical ratings and reviews comes up often on sites where long term reviewing is going on, and a frequent sum-up position is that the numbers make the most sense in the context of each review.

We can’t know what invisible rules from their mental universe any person is applying to their own ratings (For instance, I’ve only given 5 stars to games which have percolated in my brain for ages, like Wishrbinger - is that fair? Probably not. Or that I almost never go above 3 stars for very short games) – but if you look at the number in light of or just before the review content, then the sense (or nonsense) will quickly emerge.

So, re Ghalev’s comment:

Do you find you really need that much consensus? Sometimes I can work out all I need cue-wise from the written part of a single two star review, depending on the writer. That is, I can get a good impression in one review of what’s likely to interest me (or not) about the game.

So while a very high or low overall star rating can draw my eye, it’s always the review content that does the business for me.

This does mean that old games which are often long on numerical ratings but without written reviews (plenty such games I’ve rated from memory to try to help IFDB, but not reviewed because the details aren’t fresh in my mind, only my overall impressions of enjoyment or quality) often don’t get a chance to draw me anew, unless I read written reviews of them elsewhere.

In my opinion, rating games in retrospect is a mistake.
Personally, I do it when they are still a warm body (or an agonizing one, meaning a game I can’t get to finish is one I won’t rate too much, eventually) and I still can feel the pain of killing them. If I did it later (maybe a lot of time later) I’d lose much of the emotion of being there in the first place. The overall feeling, the immersion.

In these very days I’m trying to evaluate games in a famous competition.

[spoiler]Two of them, the first and the last I played, are both gonna get 5 stars. They are widely different games, with enormously different playability and length. I had rated the first now I would NOT have rated it so high.
But I understand that is not because of the first game’s lack of merits – or how well it accomplished making me feel satisfied in many ways – but because the second dropped in and changed the rules, somewhat. The game which is lingering in my mind, at the moment, is the last one, and that suffocates everything else.

What if that second game I never played?

It feels like trying to give a score to past girlfriends basing them on the actual wife. Unfair, at best. Surely wrong in many cases.[/spoiler]

Also, as a personal note: I prefer stars to reviews when evaluating something. I understand that a review can give so much more insight (you can find some even in rookie and off-topic reviews like the ones of mine), but sometimes it all gets too bigger and precise and picky. I just wanna say “i liked it” or “i didn’t like it” and to hell to all those explanations I’ve gotta give. :slight_smile:

No; I apologize if I implied a need, specifically. Wasn’t what I meant. Remember that I’m talking about getting a sense of community consensus, not about getting an idea about how much I’ll personally like it or consider it good. The community and I have pretty different values and standards, but I do like to have an awareness of that territory :slight_smile:

For me it’s often just the tags :slight_smile:

Heh. I recently rated a game four stars, and said in my review it was probably worth two.

As a reviewer, I’d happily eschew the stars system if I could (however, seeing as it’s there, I do make use of it. Every tool counts). As it is, if I end up rating something with more/less stars than I think are really worth it, I’ll explain it in the review. And indeed, I expect the reader to take the review in consideration, not the star.

Hmm, reading these comments, it seems like it might be nice to split the IFDB rating system into a few categories of stars, like importance/notability, quality/craft/polish, and creativity/innovation. But that could be a lot of work and would trash/devalue the old 1-category ratings, unless you still left unitary ratings as an option for reviewers.

I think the scoring system is fine as it is, and any added complexity would simply discourage people from voting at all. As for things like importance/notability, without an understanding of the IF world in general you’re never going to know whether a game is important or notable; we all have different ideas of what constitutes quality/craft/polish; and creativity/innovation? A game might be creative and innovative without being any good, or could lack kind of creativity and innovation but still be a great game.

Okay, this has me curious: what game are you talking about?