One of the reasons I implemented starsort in IFDB was to replace sorting by Average Rating with something useful.
I investigated whether to use IMDb’s algorithm (which Pegbiter uses) or to use Evan Miller’s “starsort” algorithm, and settled on starsort.
I found that starsort returned, in my opinion, better results than IMDb, and I showed it to some other folks and they generally agreed.
IMDb’s algorithm overrates games with a high average rating that just happen to have more than the minimum “m” number of reviews. (Pegbiter’s algorithm hard-codes a minimum number of reviews. “m = minimum amount of ratings required (13)”)
Specifically, looking at Pegbiter’s list, I don’t think Worldsmith should rank higher than Lost Pig.
One of the features that starsort has is that it effectively sorts not just by rating but by popularity, effectively capturing how likely users are to recommend a game to others. Lost Pig reliably figures in lists of “Best IF of all time,” “Best games for newcomers,” etc. It’s a highly recommended game. Worldsmith is a good game, for sure, but not as many people would recommend Worldsmith over Lost Pig, and that’s reflected in the fact that Lost Pig has 454 ratings and Worldsmith only has 29.
An extra bonus: starsort works on any number of games with any number of reviews. There’s no way to apply Pegbiter’s algorithm to #ratings:0-10 because, of course, all of these games have fewer than 13 ratings.
I honestly think that starsort is, in fact, the best method. It might seem complicated, but it’s asking and answering the question “what games can we be confident are the best, based on the available evidence?”
That would make the sorting dropdown harder to use. There would be two confusingly similar options in the dropdown, and we’d somehow have to explain the difference between them.
If it were useful for something, that might be worth it, but if it’s just so someone (you?) can generate an annual report, well, you can generate a report using the code I’ve given you.
It’s worth discussing (which is why I’m engaging in the discussion), and, if lots of folks agreed that sorting by average rating was useful, I’d implement it.
I predict that won’t happen, but I’m open to seeing what others think.