I tried to find the most positive reviewers on IFDB

mathbrush · September 19, 2024, 4:18am

@otistdog has been asking recently about how IFDB mods can determine if users are being overly negative or not. That made me realize that we don’t really have a lot of ways to get easy stats about users.

I decided to take one of the most recent public IFDB data dumps and study it with SQL. I wanted to sort reviewers by their average review count; however, I knew that would put everyone at the top who just rated 1 game with 5 stars, for instance.

So I created a Bayesian estimator (by adding the average number of average ratings to each reviewer) and sorted the result.

I was very surprised! I thought I’d be the most positive reviewer, assuming my average review count is higher than the site average and having the most ratings. Instead, I was in the top 5% of least positive!

Here’s a link to the resulting spreadsheet:

Highlights:
Most positive reviewers:

User id	User name	Avg rating	number of ratings	Bayesian average
lx4alrzk3bow9m71	TheBoxThinker	4.9512	246	4.844642264
9qejenuhziz5wjxz	aluminumoxynitride	4.7778	108	4.58139685
45nfmu83yn7fn9hy	ENyman78	5	48	4.564701493
43jnbtr2lnpqqond	k42write	5	47	4.558106061
z3bceh2t9qsnjo60	mifga	4.9773	44	4.521209524
jqhv9n24th6yx3an	Sophia de Augustine	4.7857	42	4.374334426
by5r1o5rzk68vsde	egostat	4.5465	86	4.3508
bqk2yzvxxxbxz3xo	Brendan Patrick Hennessy	4.6271	59	4.344024359
qrab0diqpihr9pb2	ArloElm	4.8929	28	4.31566383

Most critical reviewers:

User id	User name	Avg rating	number of ratings	Bayesian average
sh230hodfqjm8uv0	Fredrik	2.4486	107	2.601866667
wqbz6t7bc5johqjk	Hannes	2.4724	127	2.601573973
ng38z0cyr9hk3xnf	James Hall	2	30	2.568061224
gpgwwvh07qbjlv9i	Andy Devil	1.6316	19	2.5483
3wcnay8ie4ajtyrm	David Welbourn	1.65	20	2.534230769
sp3pwwbx2usdwliq	Andreas Teufel	1.9355	31	2.51671
swolmrnsyi849z80	Markoff23	1.3333	18	2.427956757
msz6fxxvfe7xxhxo	AmberShards	2.2763	152	2.408377778
403s1jfrocvs6qjy	perching path	2.331	284	2.402108911
aidzm1kjz78k484	Stickz	1.6786	140	1.892069182

pbparjeter · September 19, 2024, 5:18am

I noted in another thread I’m more inclined to give number ratings to games that are clearly well-done (4 or 5 stars) — and outside of comp entries, the games I play are already popular and held in high regard — so it looks about right that my average rating is 4.

However, a good portion of my reviews don’t have numbers attached, so I guess it would probably skew toward 3 if you forced me to put numbers on the rest.

SomeOne2 · September 19, 2024, 9:04am

I think this is more of a good thing. Without reviewers to balance out the high end (I can see people who rated 48 games, with an average rating of straight up 5), everything is going to be a little inflated and end up like itch. Which I really don’t want, it’s pretty good right now.

severedhand · September 19, 2024, 2:01pm

Well, if we use that oft-quoted generalisation that some amount of between 80% and 90% of everything is crap (depending on whom you quote) and you review everything (and you pretty much do)…

Over my reviews, I personally favour reviewing weirder, less-polished and less-reviewed games, which means I give a lot of twos or threes. And I cheatily reserve 5/5 stars for almost no games.

-Wade

Joey · September 19, 2024, 2:10pm

If you just look at the people with 50+ reviews, there’s a nice balance of generous and critical raters, with the average squarely in the middle. That seems like a healthy place to be for rating-meaning.

mathbrush · September 19, 2024, 2:10pm

I also measured who had the highest proportion of helpful votes (weighted in a Bayesian way) and these are the top 10:

	Total votes	Yes	No	Weighted helpfulness
Mike Russo	1384	1369	15	0.9824982332
verityvirtue	767	757	10	0.9752318296
Rovarsson	841	826	15	0.9715997706
Kinetic Mouse Car	285	285	0	0.9690981013
autumnc	254	254	0	0.9657368421
Spike	337	334	3	0.9653125
Wade Clarke	840	819	21	0.9646785304
Andrew Schultz	903	873	30	0.9574250535
manonamora	580	562	18	0.9545581015
CMG	355	346	9	0.9513860104

I didn’t include the whole chart because having a lot of unhelpful votes is a pretty bad thing so I don’t want to put people on blast

DeusIrae · September 19, 2024, 2:24pm

Wow, Kinetic Mouse Car and autumnc are flawless!

(I’m actually kinda curious how they’re not at the top, since they seem like they’ve got a critical mass of reviews…)

mathbrush · September 19, 2024, 2:26pm

The formula adds 21 helpful votes and 10 unhelpful votes to everyone to push everyone towards the average.

I think I might just cut off the last 100 or so people from that spreadsheet and share it. It’s all public info

mathbrush · September 19, 2024, 2:27pm

Here’s the truncated ‘helpful votes’ spreadsheet:

EJoyce · September 19, 2024, 2:44pm

Yeah, personally I’m glad that IFDB isn’t like the various book rating sites I use, where everything has an average between 3 and 4.5 because no one gives out ratings lower than a 3.

And I’m pretty happy with my own average rating being almost exactly 3, suggesting that I give out about the same number of ratings above and below the middle.

HAL9000 · September 19, 2024, 2:50pm

Amen, brother.

otistdog · September 19, 2024, 4:07pm

Actually, my questions were more along the lines of:

What are the criteria used when deciding to remove ratings from IFDB?
What defined oversight processes and associated public records of such actions are available?

Relevant to earlier discussion, I note that the IFDB Top 100 has undergone a massive shift in placement compared to my last check at the start of the month. In spot checks of individual titles, there is nothing to suggest that recent ratings are responsible for the movement, which in many cases is as much as 20 places up or down or more. The most recent recalculation date of Wednesday is not typical, so perhaps @pegbiter is having some trouble with the process?

The data you’ve posted is interesting.

otistdog · September 19, 2024, 4:09pm

It would be interesting to see this histogram broken out by the year in which the rating was entered. My intuition is that average ratings have been creeping up over time, but that’s just a hunch.

jkj_yuio · September 19, 2024, 5:27pm

I’ve always felt that just 5 possible levels isn’t expressive enough. I mean, you’d reserve 5/5 for your favorite game and 1/5 for games truly awful. That leaves 2,3,4 which allocates to poor, average and good. Overall not a lot of differentiation at all.

AmandaB · September 19, 2024, 5:43pm

This is one of the reasons why I don’t rate games on IFDB. The other reason is that I just can’t stand to fuck up someone’s rating with a low score, which would mean I’d only rate games I can rate highly, which feels weird, and then it all starts to seem kind of psychologically heavy, and then I realize I’m doing that thing I do that drives me and everyone else crazy, the overanalyzing, hand-wringing thing, and so I solve it by fleeing and never rating anything.

CortJstr · September 19, 2024, 5:49pm

From listening to book podcasts a lot of people feel the same way about GoodReads and its competitors. They say you should either give a book 5 stars or not rate it at all, just mark it as read because otherwise it’s mean to the author.

Boardgame Geek has an interest method where not only is it a 10-point scale but when a new game is added it’s seeded with something like 300 phantom ratings of 5.5 so that a handful of high/low ratings won’t unduly skew the average.

mathbrush · September 19, 2024, 5:59pm

As an author, I have some games that are much smaller or experimental than my others. I kind of like getting a smaller grade from one of those if I later get a higher rating from the same person on a bigger game. It makes me feel like the rater recognizes the difference in the effort I put in.

One thing I like to do sometimes for games I rate lower is to mention that I’d rate it higher if bugs were fixed; I’ve had about 15-20 people contact me after updating their games and I’ll bump up the rating a point (or more if the change is bigger), and keeping that in mind makes it a bit easier for me to give low scores when a game has significant bugs (like it can’t be finished or something similarly egregious)

mathbrush · September 19, 2024, 6:00pm

Oh that’s right; and I think Dan fabulich started a GitHub issue to add tracking of vote deletions so that there’s always a record, so eventually we should have that system in place. It would be interesting to ask pegbiter if their formula has changed at all, but I don’t think I’ve talked to them much before, definitely not recently).

jwalrus · September 19, 2024, 6:25pm

IFDB also indicates in a bunch of places that there’s a way to use your reviews to recommend other games that you might like. I fell like that was probably a minor incentive for me when I first came across it to make sure to use all of the available ratings, because if you don’t tell it you like some stuff more than others, it wouldn’t be able to tell you anything, right? But I’m not sure IFDB has that feature any more, so I don’t know if that will continue to make a difference.

dfabulich · September 19, 2024, 6:47pm

Yeah, that feature to “highlight games similar to ones you’ve reviewed highly” is gone now, because it didn’t really work, and it was very expensive to compute. (You may remember last year or so when IFDB would constantly take 10+ seconds to load.)

The theory of the feature was that we could identify users with “similar tastes” to yours, and automatically highlight games that they like. The problem is, there really isn’t a useful clustering of users, and, to the extent that there is, the clusters are really obvious, e.g. “likes Twine” vs. “likes parser,” which means that the algorithm would spit out a bunch of random Inform games to try.

Even game-level similarity didn’t work on the basis of reviews. When games are only getting a dozen reviews, there’s not enough data to say “game X is meaningfully similar/dissimilar from game Y.” I caught the algorithm claiming that Hadean Lands (a long-form parser puzzlefest) was similar to Witch’s Girl (a delightful mini Twine game with easy puzzles and kid-friendly graphics). People did rate them both highly! … but they’re not even remotely similar.

I think there might be a better algorithm possible that identifies systems, genres, and tags that you tend to like, but nobody’s had time to work on it, and I’m skeptical that it would be very useful. (Do you really need an algorithm to tell you that you like mystery games? Or that you like IFComp games?)