@otistdog has been asking recently about how IFDB mods can determine if users are being overly negative or not. That made me realize that we don’t really have a lot of ways to get easy stats about users.
I decided to take one of the most recent public IFDB data dumps and study it with SQL. I wanted to sort reviewers by their average review count; however, I knew that would put everyone at the top who just rated 1 game with 5 stars, for instance.
So I created a Bayesian estimator (by adding the average number of average ratings to each reviewer) and sorted the result.
I was very surprised! I thought I’d be the most positive reviewer, assuming my average review count is higher than the site average and having the most ratings. Instead, I was in the top 5% of least positive!
I noted in another thread I’m more inclined to give number ratings to games that are clearly well-done (4 or 5 stars) — and outside of comp entries, the games I play are already popular and held in high regard — so it looks about right that my average rating is 4.
However, a good portion of my reviews don’t have numbers attached, so I guess it would probably skew toward 3 if you forced me to put numbers on the rest.
I think this is more of a good thing. Without reviewers to balance out the high end (I can see people who rated 48 games, with an average rating of straight up 5), everything is going to be a little inflated and end up like itch. Which I really don’t want, it’s pretty good right now.
Well, if we use that oft-quoted generalisation that some amount of between 80% and 90% of everything is crap (depending on whom you quote) and you review everything (and you pretty much do)…
Over my reviews, I personally favour reviewing weirder, less-polished and less-reviewed games, which means I give a lot of twos or threes. And I cheatily reserve 5/5 stars for almost no games.
If you just look at the people with 50+ reviews, there’s a nice balance of generous and critical raters, with the average squarely in the middle. That seems like a healthy place to be for rating-meaning.
Yeah, personally I’m glad that IFDB isn’t like the various book rating sites I use, where everything has an average between 3 and 4.5 because no one gives out ratings lower than a 3.
And I’m pretty happy with my own average rating being almost exactly 3, suggesting that I give out about the same number of ratings above and below the middle.
Actually, my questions were more along the lines of:
What are the criteria used when deciding to remove ratings from IFDB?
What defined oversight processes and associated public records of such actions are available?
Relevant to earlier discussion, I note that the IFDB Top 100 has undergone a massive shift in placement compared to my last check at the start of the month. In spot checks of individual titles, there is nothing to suggest that recent ratings are responsible for the movement, which in many cases is as much as 20 places up or down or more. The most recent recalculation date of Wednesday is not typical, so perhaps @pegbiter is having some trouble with the process?
It would be interesting to see this histogram broken out by the year in which the rating was entered. My intuition is that average ratings have been creeping up over time, but that’s just a hunch.
I’ve always felt that just 5 possible levels isn’t expressive enough. I mean, you’d reserve 5/5 for your favorite game and 1/5 for games truly awful. That leaves 2,3,4 which allocates to poor, average and good. Overall not a lot of differentiation at all.
This is one of the reasons why I don’t rate games on IFDB. The other reason is that I just can’t stand to fuck up someone’s rating with a low score, which would mean I’d only rate games I can rate highly, which feels weird, and then it all starts to seem kind of psychologically heavy, and then I realize I’m doing that thing I do that drives me and everyone else crazy, the overanalyzing, hand-wringing thing, and so I solve it by fleeing and never rating anything.
From listening to book podcasts a lot of people feel the same way about GoodReads and its competitors. They say you should either give a book 5 stars or not rate it at all, just mark it as read because otherwise it’s mean to the author.
Boardgame Geek has an interest method where not only is it a 10-point scale but when a new game is added it’s seeded with something like 300 phantom ratings of 5.5 so that a handful of high/low ratings won’t unduly skew the average.
As an author, I have some games that are much smaller or experimental than my others. I kind of like getting a smaller grade from one of those if I later get a higher rating from the same person on a bigger game. It makes me feel like the rater recognizes the difference in the effort I put in.
One thing I like to do sometimes for games I rate lower is to mention that I’d rate it higher if bugs were fixed; I’ve had about 15-20 people contact me after updating their games and I’ll bump up the rating a point (or more if the change is bigger), and keeping that in mind makes it a bit easier for me to give low scores when a game has significant bugs (like it can’t be finished or something similarly egregious)
Oh that’s right; and I think Dan fabulich started a GitHub issue to add tracking of vote deletions so that there’s always a record, so eventually we should have that system in place. It would be interesting to ask pegbiter if their formula has changed at all, but I don’t think I’ve talked to them much before, definitely not recently).
IFDB also indicates in a bunch of places that there’s a way to use your reviews to recommend other games that you might like. I fell like that was probably a minor incentive for me when I first came across it to make sure to use all of the available ratings, because if you don’t tell it you like some stuff more than others, it wouldn’t be able to tell you anything, right? But I’m not sure IFDB has that feature any more, so I don’t know if that will continue to make a difference.
Yeah, that feature to “highlight games similar to ones you’ve reviewed highly” is gone now, because it didn’t really work, and it was very expensive to compute. (You may remember last year or so when IFDB would constantly take 10+ seconds to load.)
The theory of the feature was that we could identify users with “similar tastes” to yours, and automatically highlight games that they like. The problem is, there really isn’t a useful clustering of users, and, to the extent that there is, the clusters are really obvious, e.g. “likes Twine” vs. “likes parser,” which means that the algorithm would spit out a bunch of random Inform games to try.
Even game-level similarity didn’t work on the basis of reviews. When games are only getting a dozen reviews, there’s not enough data to say “game X is meaningfully similar/dissimilar from game Y.” I caught the algorithm claiming that Hadean Lands (a long-form parser puzzlefest) was similar to Witch’s Girl (a delightful mini Twine game with easy puzzles and kid-friendly graphics). People did rate them both highly! … but they’re not even remotely similar.
I think there might be a better algorithm possible that identifies systems, genres, and tags that you tend to like, but nobody’s had time to work on it, and I’m skeptical that it would be very useful. (Do you really need an algorithm to tell you that you like mystery games? Or that you like IFComp games?)