IFDB games sorting and Alternative Top 25 (formerly 100)

Denk · January 25, 2023, 12:24pm

I hope the IFDB guys are reading this

After a break I was going to make the 2023 edition of 2020 Alternative Top 100 which got a lot of positive response. But I can’t now because I cannot sort by average rating anymore. Alternative solutions are welcome!

I am not in a hurry but I really hope that the people behind IFDB will make the average rating an optional sorting algorithm again in the near future. Sometimes simple is beautiful. You could simply call it “Sort by Average Rating” and place it near the bottom (see picture). More discussion at the bottom of this post.

Until this becomes available(?), I might be able to utilize the data sets which were uploaded to the IF Archive but they are sql-files, huge and I don’t know how to get the relevant data into OpenOffice Calc.

A good advice would be very much appreciated. Thanks!

DISCUSSION OF SORTING ALGORITHMS
The simple “average rating” was replaced with a relatively simple statistical method in 2021 as described here: !New Features on IFDB

It might not be a bad idea because some games have only a single rating and might be rated by someone biased etc. However, there is no proof that this method is always better than the simple average. Sometimes that one 5 star rating was made in good faith whereas a game with 5 ratings sometimes has 3 suspicious ratings.

Example:

You can see that 80 DAYS is ranked higher, even though its average rating (see the graphical stars) is lower. Is 80 DAYS a better game, just because it got 73 ratings instead of 55?

I am just saying that “complicated” statistical analysis isn’t alway more correct than simple approaches. The optimal method depends very much on the dataset in question. So I think sorting by simple average rating should at least be optional.

If you want more accurate ranking, I think you should start with removing all ratings given by the game’s author, i.e. when the game is linked to their own profile. For instance, that would probably give a less biased ranking of Cragne Manor [I was an author too but it is hard - very hard - to be completely unbiased in this case]
EDIT: About Cragne Manor, apparently only four of the 19 ratings were given by a Cragne Manor author and one of them excluded his rating from the average so it seems that unbiased players also think that Cragne Manor is a great game!

rovarsson · January 25, 2023, 1:31pm

They’re not rating the game. They’re giving themselves a high-five and a pat on the back for daring to confront the dreaded Monster Game.

jkj_yuio · January 25, 2023, 2:04pm

What? You can rate your own game. Is this true?

Denk · January 25, 2023, 2:07pm

Yup, just tried it - you still can. I have seen authors elsewhere rate their own game.

zarf · January 25, 2023, 2:26pm

I promise that the IFDB admins look hard at the dataset before making any changes. :)

Denk · January 25, 2023, 2:46pm

Sure. What I mean is, that if someone makes their own search, the IFDB admins have no idea what that search will be but the sorting algorithm is still the same, i.e. Evan Miller.

mathbrush · January 25, 2023, 2:53pm

You have a lot of good points. Right now, you can access some ‘secret’ ways of ranking games by including them in your search, like ‘ratingdev:2-’ to find games with a standard deviation of 2 or more, which unlocks the ability to order things by standard deviation. Mike Roberts coded that in originally. So I could see having an option for ‘unweighted average’ as a sort method if the search includes ‘rating:4-’ or something like that.

Denk · January 25, 2023, 3:05pm

I am nowhere as good at math as you, so I might misunderstand. Do you mean I could do a search now so that the ranking was unweighted or does it require changes to IFDB ?

mathbrush · January 25, 2023, 3:17pm

Sorry, I wrote it in a weird way. I meant that it’s not in IFDB now, and would require changes. (I think. I swear at one point we were going to leave in a way to access it, I’d have to look back to see if we did it somewhere).

mathbrush · January 25, 2023, 3:25pm

Hmm from what I can find you can still get the average rating from IFDB’s API using <averagerating>, which is what Pegbiter’s list does, but I actually don’t know how to use API’s so I don’t know if that’s helpful or not.

Denk · January 25, 2023, 4:45pm

Unfortunately, I don’t know either

Denk · January 25, 2023, 5:18pm

I couldn’t help comparing Pegbiter’s Top 100 with the Evan Miller method (default sort method on IFDB):

Not so far from each other but I guess the methods are somewhat related(?)

mathbrush · January 25, 2023, 6:08pm

They both add some “default ratings” for games to water down games with few ratings. The main difference is that the current IFDB sort also subtracts the standard deviation, which isn’t something I’ve seen a lot before but its only effect is to make games with a wide range of scores place lower than games with a narrow range of scores.

Edit: Violet has a really high standard deviation, .745, while counterfeit monkey and superluminal vagrant twin have the lowest, .43 and .46
(Edit: I think I’m missing something here because I can’t use standard deviation alone to figure out the specific differences)

StJohnLimbo · January 25, 2023, 7:32pm

You can append “&xml” to the URL of a search result on IFDB, and you’ll get an XML representation of the results.

For example, this gives the standard list of all games, sorted by Highest Rated First (with the new-ish IFDB methodology following Evan Miller).

It will also try to display all results on one page (“&pg=all”), but IFDB caps this at 500 entries (or 487 unflagged ones, I think).

It depends on your browser how the file will be displayed, but you can save it as an XML file, let’s say “search.xml”.

You can open that file in a text editor and see that each game has a field “<averageRating>”, which contains the unweighted average rating. At least, I presume that this is it, because the current sorting by the new method does not correspond to it.

Perhaps more usefully, you can also import it into a spreadsheet program like LibreOffice Calc or similar.

In LibreOffice Calc: “Data” Menu → XML Source → choose the xml file, mark “ns0:game” in the hierarchy displayed in the upcoming dialog box, choose a cell to map the entries to (just use A1 in a new document), and click “Import”.

libreoffice_import_xml

The result should be a table with 487 rows, each representing a game.

ifdb_xml_and_ods_files.zip (91.8 KB)

You can then sort it by the column “ns0:averageRating”. Of course, the results of the re-sorting might not be very meaningful, because they are based on the results of the Miller-ranking, which were capped at 500 entries. But it is interesting to play around, and you can of course also modify the original search to use only games with at least n ratings, for example 10.

mathbrush · January 25, 2023, 7:47pm

Right now, there are 247 games that have a perfect 5 stars, many with multiple ratings. A search that uses average rating will return these games first.

Almost all of those with more than 3 ratings are Chooseyourstory games: https://ifdb.org/search?searchfor=rating%3A5.0&searchgo=Search+Games

dfabulich · January 25, 2023, 9:06pm

The premise of Evan Miller’s algorithm (which I sometimes call “starsort”) isn’t that it finds which games “truly are” better, but that it maximizes for our certainty that a game is better. It’s an epistemological question, a question about what we know, rather than an ontological question, a question about what the truth really is.

Suppose someone submits the perfect game to IFDB, a game so good that everyone who ever rates it will necessarily give it 5-stars. At first, it just has one 5-star review. At that point, as far as we know, the game could be the perfect game, but there’s no way to be sure of that. There are a ton of games with just one 5-star review that are nowhere near as good as Hadean Lands.

But when the game has two 5-star reviews, well, that’s stronger evidence (we can be more certain) that this game is something really special. By the time a game has 18 reviews with a perfect 5-star average, that’s enough to appear near the bottom of the first page of search results sorted by starsort. But, again, we still don’t know that this game is actually better than Hadean Lands.

It would take a few dozen perfect 5-star reviews before we’re more certain that a game is better than Hadean Lands.

As a result, I think the answer is: yes, we can be more certain that 80 Days (Average Rating 4.7123, 73 ratings) is a great game than we can that Hadean Lands is a great game (Average Rating 4.7818, 55 ratings). It’s possible that if a couple dozen more people rated Hadean Lands, it would turn out to be better than 80 Days, but I think it’s just as likely that one or two of those reviews would be a 1-star or 2-star review, pulling Hadean Lands down in rankings.

And, therefore, 80 Days should be ranked higher, “just” because 18 more people reviewed it, because there’s more evidence that 80 Days is great than there is evidence that Hadean Lands is great.

Denk · January 25, 2023, 9:18pm

If the unweighted average is only optional, there isn’t a big problem.

The problems with communities pushing their games up the list shows that it isn’t necessarily a problem with few ratings but as many as seven ratings. We should always keep an eye on strange results. On the other hand, since they come from the same community it might be that they actually love their own type of games that much so they are voting in good faith. If just some of us give their game a chance, we might like some of them but (much?) less of us will give them good ratings, which will bring the average rating of those game down again. But people should of course not start giving “hate ratings”. Obvious bad behaviour, independent of which side you are on, should be given a warning etc.

However, if we want a simple default method, here is a simple approach:

“Under the hood” the IFDB-site divides the games into two groups: Those with zero standard deviation and those with a non-zero standard deviation.
Those with non-zero standard deviation are listed first, sorted by unweighted average.
Those with zero standard deviation are listed next, preferable with a note saying “zero standard deviation - unreliable rating” (something like that)

In this manner, we give new games a chance to be on top for a short while. If they want an expert opinion, they go to Pegbiter’s IFDB Top 100. It could have a button on the front page of IFDB. There is no reason to require that the default search is close to IFDB Top 100. After all, many of the ratings were given 10 years ago or so and new games should be given a chance. In a way, those who are still around has more value to the IF community now than heroes who had to leave for some reason.

mathbrush · January 25, 2023, 9:21pm

This is actually quite similar to the reasoning behind the system currently in use. If you keep pursuing these ideas, you may be corrupted by Bayesian statistics and end up with the same system…

dfabulich · January 25, 2023, 9:22pm

Quoting from definition of “2020 Alternative Top 100”

Philosophy:

If a game only has 5-star ratings, it is because the game hasn’t received enough ratings.

Games with few ratings can still be among the best.

Sometimes the average score is the best metric.

I don’t think I agree with point #3. I do agree with point #2, because some great games get overlooked, but, IMO, that just means that some under-reviewed games would rank more highly in starsort if more people were to review them, not that the average score is a better metric than starsort.

I think my preferred approach to find “overlooked” games that deserve more reviews is just to sort by starsort, putting a cap on the number of reviews.

Here are some plausible searches:

#ratings:0-10 Top ranked: Stay
#ratings:0-20 Top ranked: Cragne Manor
#ratings:0-30 Top ranked: Worldsmith
#ratings:0-50 Top ranked: Junior Arithmancer

Denk · January 25, 2023, 9:29pm

I get the idea behind it, it is just that roughly speaking, we always rank games / movies etc. lower if they has less ratings if the average rating is in the same ball park and that is to me fundamentally wrong. I agree that we don’t know the truth but it seems unlikely that all new games are worse than old games. Just because we don’t know the truth it doesn’t mean there isn’t a truth. By not being afraid of placing new, well rated games higher, more people will play them and they will get more ratings so the result becomes more reliable. Of course I can’t know but that is my assumption, just as the assumptions behind each method

Well, Pegbiter’s list does not agree with that